eNewsletter 3
Volume VIII, Number 1, January, 2008



Send TechConnections to a Friend now! Forward to a Friend!

Happy New Year!


And welcome back to work. As most of you know, SemCo™ always closes for two weeks at the end of the year, so for us it's really coming back. While we do check email and phone calls, we do plan to spend the time with family and friends and don't even pretend that this is time to "get a lot done." So, now we're ready to get a lot done!

And, of course, there's a lot to do. I am checking the blogs and articles on "hot trends for 2008" and the like. Check out my compilation of what these are saying. One basic tone – even with questions about the overall economy, most IT gurus expect 2008 to be a good year for IT with salaries going up, new jobs opening up, and exciting technologies growing in use.

Here's the schedule (or you can view the complete Schedule on our website.

CSTA Web sessions:
January 9, 10
February 6, 7
March 5, 6

CSTA Classroom session:
DC area - February 20
Atlanta - February 27

UITJ (Understanding IT Jobs) Web sessions:
January 10
March 6

TR Web sessions:
January 23

Keep in touch and keep up with technology!

Back to top

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TechKnowledge


Data Quality - More Important Than Ever

This one's simple. When you get more data, quality gets more important. It's easy to keep data quality high when you have a small amount of information – you can actually hand check each piece of information. But when we start talking about terabytes and petabytes of data, we've got problems. The first problem is to actually define what we mean by "quality."

We can start with data assurance, which is the term most often used instead of data quality. Data assurance says that data must be consistent (all aspects of the data must be available), it must be current, or timely, it must be accurate, or correct, it must be valid (important to the business), and it must be unique (no duplicates). This definition falls into the category "easier said than done."

Consistency - having all aspects of data available at the same time – is often difficult because the data is located in different sources. No longer does a single database hold all the information needed for an application, so data integration (using data from multiple sources) is critical. MDM (Master Data Management) is growing in use and is often described as a separate data layer of information. With MDM a definition of all aspects of the data is created and data – regardless of the source – must conform to this definition.

Consistency must be accompanied by currency. The master data must ensure that data from different sources is from the same time or time period. Currency doesn't just mean the latest data. In fact, many compliance regulations require that data must always be associated with an accurate date. It's not enough to know we had, e.g., 567 employees working full-time, we have to know what time period that data refers to. Data dictionaries, MDM, and metadata can help to achieve currency.

Accuracy, or correctness, is handled in many ways. Data validation is part of every system, and includes data profiling which is a data integration technique that examines a data store and collects statistics and information including:
• A count and percent of null data
• A count and percent of unique data
• Maximum and minimum values
• Top and Bottom values by frequency
• Ranking levels and count intervals.

This kind of information is augmented by pattern analysis, relationship rules, and metadata validation. All of this is used to ensure the accuracy of data. Once a profile has been built, new data must fit the profile, or be questioned. Obviously the profiles must be continually updated to ensure they are accurate, current, and consistent. Data validity is an extension of accuracy, and is handled by the same techniques.

Uniqueness is really important when dealing with massive amounts of data. It's all too easy to enter duplicate data into a system. If you're building a data warehouse from operational databases, what do you do when you have a record for John Jones in your checking account database and a record for Jonathan Jones in your savings account database? Is this one customer with two accounts, or two customers? Deduplication eliminates duplicate data by merging together duplicate information. Data can be found as duplicate by matching on operators including phonetic, direct word match, telephone/fax number and name initials. And this is referring to structured data, which is data found in databases and files. Today it's estimated that 80% of corporate data is unstructured and found in emails, procedure manuals, Web sites, etc. Unstructured data is even more difficult to work with and really increases the problem of ensuring uniqueness of the data.

We all rely on our data to make our corporate decisions – ranging from what products should we lead off with in the ad for the Anniversary Sale, to should we start selling pet insurance? We need data assurance to make the correct decisions! As companies get more and more dependent on BI (Business Intelligence) and BPM (Business Process Management), the time invested in data assurance gets more and more important.


Back to top

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TechCheck


1. Are you part of Generation V?

2. In what area of IT are people talking ACID?

3. And, where does CIA pop in?

4. True or false:
Desktop and laptop computers have Windows Vista pre-installed as the operating system.

5. Which of the following does not belong:
a. Fedora
b. Gentoo
c. Sled
d. Ubuntu


Back to top

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Hot Trends for 2008


As I said earlier, I'm busy reading all the blogs and articles from IT gurus who, as always at the first of the year, ring in on what they think is going to be hot in the new year. Here's a compilation of the existing technologies that these authors expect to be dominant.

1. Virtualization. This appears, either as virtualization or as on-demand, on most lists. Multiple dedicated servers are being replaced with virtual servers sharing network-based resources such as common storage. This not only lets businesses reduce the cost of infrastructure, but also allows them to respond quickly to changing business needs.

2. Green IT. This is an economic push, and also a reaction to regulations! Green IT (cutting down the carbon footprint in the data center). Companies might want to do this, but they also have to do it. From IBM to small companies, going green is even appearing in mission statements.

3. Collaboration – in all ways. We hear the most about collaborative development, but remember this includes virtual meetings. For all kinds of reasons (maybe starting with how much fun travel can be ) online meeting systems are popping up all over.
4. The growth of Web 2.0 technologies. More mashup tools are appearing, including tools that are intended to be used by non-technical staff (a mashup is a Web site that is created by pulling together other sites, or functions from other sites). This includes the growth of social networking in the corporate world. Picture companies having their own Facebook or MySpace, and internal wikis to be used by employees to network with each other.

5. The growth of PDAs, specifically smartphones. The google phone should come out this year, but the big news is companies moving their applications to the small devices, with the small screens.
Next month I'll address some of the newer technologies that will start to shape the changes in IT over the next few years.

Back to top

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Short Vocabulary


???? Computers

We have to start off with the topic itself – there's a whole new category of computers. We're talking about those computers that are bigger than a PDA, but smaller than a notebook, or laptop. Products have been appearing in this area for many years, but most have been unsuccessful. This year, according to some pundits, is going to be the year this area gets filled. Some of the potential winners, and the technology used with these systems:

clamshell Personal computer technology. Design for handheld systems where the device opens to reveal the screen and operational controls.

flash storage Variation of EEPROM (Electrically Erasable and Programmable Read Only Memory). Information is burned into memory in blocks, not individual bytes, and memory can be erased and reprogrammed as often as needed. Flash memory provides the best speeds and cost ratios, and is the most popular type of ROM being used today. Includes two types:
NAND Flash memory, used in memory cards and USB flash drives, MP3 players. Also provides the image storage for digital cameras. Used in devices requiring high capacity data storage and offers faster erase, write, and read capabilities over NOR architecture. Software cannot execute in NAND Flash, and must be moved to RAM for execution, so NOR Flash is more commonly used to store code. Introduced by Toshiba in 1989.
NOR Flash memory, used in flash devices used to store and run code, usually in small capacities. Has fast read capabilities but slow write and slow erase functions compared to the NAND architecture. NOR technology is more commonly found as embedded designs and in lower-end set-top boxes and mobile handsets, and BIOS chips. Introduced by Intel in 1988.

handheld computers, PDAs Light-weight (1 pound and less) task specific computers. These machines can be pen- or stylus-based, offer voice recognition, fax and modem communication, and include a pager. Typical use: write in "lunch - John" and the system will send John a fax and enter the lunch date in the user's appointment book. Many offer connections to the Internet for email and Web surfing. Also called PDAs (Personal Digital Assistants).

notebook computer Notebook computers are portable computers that can weigh as little as three pounds. Most notebooks can plug into larger computers, which gives the user the capability of working away from the office and simply uploading whatever information is collected or processed while on the road. Most are close to the size of a standard notebook (8 1/2" x 11") and have internal hard disks, modems, CD and diskette drives. The smallest of these systems, the sub-three-pound ones, are called ultraportable notebooks. Also called laptops.

Tablet PC Desktop computer, or PC (Personal computer). Built as a flat tablet that resembles a small chalk board. Has a touch screen and accepts pen input. Some tablets offer functionality almost equivalent to full PCs, and others provide only Internet access and basic functions such as calendaring and address books. Some need to connect to phone lines and/or network cables while others provide wireless access. Pure tablets (or slate tablets) work with pen and stylus only, and convertible tablets have an add-on keyboard and can function as a regular laptop.

Ultramobile PC Mobile PC. Referred to as an "ultramobile" PC, it's smaller than a tablet PC, yet larger than a PDA (handheld computer). The devices provide the functionality of a tablet PC including touch screen, and the ability to take notes on the screen, video and Internet functions with both Wi-Fi and Bluetooth support. The PCs can be produced by any manufacturer and use Intel processors and run Windows XP Tablet Edition. Also called Origami. Devices from three different vendors announced March, 2006.

Ultraportable Notebook computer. Smaller than typical notebooks with the size often around 10 inches by 8 inches and the weight around three lbs. Made for business travelers, these systems have most of the capabilities of a full notebook, but battery life is usually limited and the keyboard is cramped and not good for lengthy use.

Back to top

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Answers to TechCheck


1. Generation V is a reference to the latest group of consumers who are identified by what they do, rather than by when they were born. Generation Virtual, or V, is made up of people from multiple age groups who make social connections online — through virtual worlds, in video games, as bloggers, in social networks or through posting and reading user-generated content at e-commerce sites such as reviews on Amazon.com. These people are drawn to the Internet's "flat meritocracy," in which people can gain status and acknowledgment through ways not generally available in the physical world, such as providing advice or recommendations or excelling at a video game.

2. The database world. ACID properties are:
Atomicity: connecting a series of database operations which either all occur, or all do not occur. When all operations conclude, the whole series is committed, or canceled.
Consistency: All operations must follow the integrity rules. For example, if a rule states that all accounts must have a positive balance, then any operation violating this rule will cause the transaction to be aborted.
Isolation: Operations in a transaction must appear isolated from all other operations. This means that no operation outside the transaction can ever see the data in an intermediate state.
Durability: Once the user has been notified of success, the transaction will not be undone. This means it will survive system failure, and that the database system has checked the integrity constraints and won't need to abort the transaction.

3. As far as IT goes, CIA is part of security – it's a benchmark for evaluating security systems:
Confidentiality: Limiting access to information to authorized users. Includes protections against malicious software, span spyware, and phishing attacks.
Integrity: The trustworthiness of information resources. Insuring that data has not been changed either maliciously or by accident insuring the accuracy of the data.
Availability: Data is available when needed. Includes both physical availability, insuring that the hardware is working and protecting against natural problems (wind, water, etc.) and human causes, accidental or deliberate.
CIA is part of both internal and external regulations and standards including HIPPA.

4. False: Dell and Levono are now shipping PCs pre-installed with Linux.

5. We're sticking with Linux for another question. b) doesn't belong: Gentoo is not desktop Linux. It's a developers' edition which allows users to create both server and desktop Linux systems.


Back to top

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Privacy Policy


SemCo Enterprises, Inc. respects your privacy. We do not sell, rent or share your information with anyone.

   
Contents
Happy New Year!
Teaser
TechKnowledge
TechCheck
Answers to TechCheck
Short Vocabulary for ???? Computers
Hot Trends for 2008
   
SemCo's Newsletter

TechConnections is SemCo's free monthly newsletter that features important IT articles and a unique perspective on IT for the non-technical professional.


   
Teaser
What was the hottest product in 2007?


TechConnections Archived Editions

If you receive the Text version of this newsletter and you'd like to view it in HTML, join our Resources membership, then click on "Register Today."



If you have a technical question while reading TechConnections or if you would like to make a suggestion, send us a quick email - we'll respond, usually within 24 hours!
Back to top

Contact us at:

SemCo Enterprises, Inc.
P. O. Box 181265
Casselberry, FL 32718-1265
407.574.6759
semco@semcoenterprises.com
http://www.semcoenterprises.com

Copyright © 2008 SemCo Enterprises, Inc. All Rights Reserved (but feel free to quote it, think about it and forward to others.)

You are subscribed as semco@semcoenterprises.com. To unsubscribe please click here.