December – holidays, vacations, decorations, music, food, relatives, mail, packages. And, a little bit of work thrown in there to finish the year.
To finish this year, I want to thank all of you for being part of SemCo's world. You all know how much I love all this techie stuff. That means when I learn something new I just have to share it, I picture you all when I tweet, blog, or update the seminars and TechRef™®.
It's been a great year in many ways and I hope all of you finish it with a warm and wonderful holiday season. My – our – best to you and your families, and here's a toast to a great new year for everyone.
Susan, Cheryl, and Peggy
Here's the schedule or you can view the complete schedule on our Website:
CSTA Web sessions:
December 15, 16
January 19, 20
March 2, 3
UITJ (Understanding IT Jobs) Web sessions:
TR Web sessions:
Keep in touch - I love hearing from you - and keep up with technology!
Back to top
MDM (Master Data Management)
MDM (Master Data Management) has become pretty important. As companies accumulate more and more data, in more and more databases…
This is really basic to the problems of so much data. We might have terabytes (trillions of bytes) of data, but how much UNIQUE data do we have. We have all sorts of redundant data, meaning the same data appears in more than one location (e.g. database). We have customer names and addresses in the Sales database, the Customer database, the Billing database, the Employee database (because some of our employees are also our customers), and perhaps in the Collections database. Picture yourself as that customer. What happens if you move? If you're a good customer, you could go to the company Web site and provide the company with your new address. If you pay your bills manually (write a check and put it in the reply envelope provided), you can put your new address in the space provided. You could phone the company; most menus have a choice that lets you change personal information. So, you get the info to the company. But, you're only going to do it once. Not two, three, or even five times. So it's up to the company to make sure all five databases get updated – correctly – within a reasonable period of time. What are the chances?
Another look at the problem – have you gotten marketing materials from your bank trying to sell you a mortgage? Even though your mortgage is with this bank? Of course you have. The marketing system has information about you – it has your name and address. But, it doesn't have information about your accounts. The marketing system naturally goes to the customer data to sell mortgages as the best place to get new business is with existing customers. You get the (junk) mail. This is where MDM comes into the picture.
First of all, let's address the synonyms. Master Data Management is also called Reference Data. But it has older synonyms. Years ago companies started talking about federated data, or federated databases. A federated database is one that accesses data from multiple loosely connected databases. This is exactly what we're talking about MDM actually moves the data from multiple databases to a hub.
Master Data is built around a specific type of data and customer data, in fact, is the most common type. It even has its own acronym CDI (Customer Data Integration). Next would be Product data (with PIM – Product Information Management). Other types can be built – location, order, vendor, etc. Just remember, it's going to be one type of data.
The master data is populated from integrating the data in the multiple sources. This means we have to include the address – but which address? All of our databases have at best slightly different addresses. In one, the street address could be 1 Main Street, the next could say 1 Main St, then next 1 Main, and …. We know these are identical, but a computer does not. This brings up functions including merging, de-duplication, cleansing (or scrubbing), and standardizing. We will get that address into a hub (master data is stored in a hub), or point to the choice we made (a federated database). In either case we've got a lot of work to do. Some, or all, of the following will be used:
Standardizing – changing all variations to an agreed-upon standard, e.g., New York, NY will always be used, never NY, NY, or New York, New York.
Cleansing – validating data and putting the correct data into the hub. Our example could pick any of our street addresses as there is no "correct" choice, but that's not always the case. Is a customer's name Johnson or Johnsen?
De-duplication - eliminating duplicate data by merging together duplicate information. Data can be found as duplicate by matching operators including phonetic, direct word match, telephone/fax number and name initials.
Merging – builds the master data using all the above techniques.
MDM (Master Data Management) is becoming more and more important as companies start collecting more and more data. Large companies are the biggest users (large companies = large amounts of data) and it's become basic to BPM (Business Process Management) which are the systems used to design and manage our business processes. These systems are just moving into the mainstream. MDM is also used with BI (Business Intelligence) and CRM (Customer Relationship Management), which are easily two of the most important systems in-house. In any house.
Back to top
1. What are smartphones most used for?
2. What standard has become common in building BPM (Business Process Management) systems?
3. How many tablet computers are competing with Apple's iPad?
4. And just what is a social browser?
5. And what has Motorola done to smartphones now?
Back to top
While updating our seminars is an ongoing activity, I do a formal look at the end of every year. This is when the surveys/predictions for what's going to happen in IT during the upcoming year start to appear. We're a little early on this, as I only found one "hot skills in 2011" survey so far, but just looking was inspirational, at least as far as UITJ (Understanding IT Jobs) goes!
I managed to revise the entire session and included much more emphasis on actual IT jobs. We'll spend more time talking about what techies actually do, and what words to look for on resumes and requirements. IT recruiting is tough, because you're not comparing apples to apples when you're looking for a match – you're comparing apples to oranges much of the time. This should make it easier. I'm really excited about the new look. And remember, it's included in our TR Program (Technical Recruiting) as well.
While TR isn't scheduled again in 2010, UITJ runs on December 16th. Check it out…
There's always something new. Make sure you keep up!
Back to top
Short Database Vocabulary
As I say so many times, data is half the game in IT. We have no IT unless we have data to process, so what kinds of data do we have, how do we store and retrieve data and how do we measure data? We'll limit the discussion to databases, and this will be a good way to start 2011.
data Data is information. There are many types of data, and data in computer systems is encoded in a binary code. The most common code systems are ACSII (American Standard Code for Information Interchange) and EBCDIC (Extended Binary Coded Decimal Interchange Code). Data also exists as structured (stored in a defined manner in files and/or databases) or unstructured (data existing in memos, emails, reports, etc.). Data is most often thought of as text and perhaps graphics, but data also exists in audio and video formats.
database appliance Database system that combines hardware and software - a DBMS (DataBase Management System) and an OS (Operating System). Usually both the hardware and software are provided by a single vendor, but "software appliances" also exist where the bundled database and operating system can run on any common hardware. A common use for these systems is data warehousing as they are usually scalable through terabtyes of data. Because they are acquired as a single system, installation, maintenance, and support are simplified.
database middleware Communications, middleware. Software that allows client systems to request data from one or more databases through a common access API (Application Programming Interface). This is the most basic form of middleware.
database driver A program that allows access to a database. The program translates an application's queries to the command language of the database. ODBC (Open DataBase Connectivity) drivers allow access to SQL databases from programs written in different languages and JDBC (Java DataBase Connectivity) drivers allow access to SQL databases from programs written in Java. The drivers make it possible for a single application to access data from diverse databases such as Oracle, Sybase, etc. The major database vendors have written the database driver programs, and databases are described as being ODBC and/or JDBC compliant.
database server The software that runs in the host computer of a distributed processing system, or in the server computer of a client-server system that holds and manages the database. Provides data retrieval, storage, protection, and security functions.
database tuning Database processing. Managing and controlling the processing and performance of database activity. Includes monitoring response time, checking for bottlenecks, and performing load balancing. Activity of technical programmers. Also called performance tuning.
column-based database Database design. Similar to relational databases, but the structure is based on the columns, not the rows. This means data is retrieved by columns, putting all the like data together. Column-based databases are up to twenty times faster and require up to 90% less table storage space than traditional RDBMSs. Designed for read-intensive workloads such as data warehouses. Also called columnar database.
federated database A federated database is a logical joining of physical databases so that all of the data can be treated as if it were in a single database. The actual databases share no resources and are connected only through software. These databases are also called virtual databases.
in-memory database Database which stores all the data in the computer's memory using disks only for storing log and backup information. Can increase database performance by as much as 50 times.
key-value database Database architecture used in cloud databases. The database has domains instead of tables, and domains contain items. Items are defined by keys, which can have a dynamic set of attributes. Each item can have a unique schema and contains all the pertinent information about the item. A domain could contain customer items and order items and data is commonly duplicated between items. Data is accessed through APIs (Application Programming Interfaces) commonly following SOAP (Simple Object Access Protocol) or REST (Representational State Transfer) standards. These databases scale easily and dynamically and are good for document-oriented data and distributed scalability.
MVDB (MultiValue DataBase) A database that has relational properties, but uses files and records rather than tables. MVDBs are extremely flexible and fast. The design was originally used in the Pick database in the early 1970s. Current MVDBs include: Universe, UniData, D3 (new name of Pick), Reality, MaVerick, and jBase.
NoSQL database Database terminology. Describes a database that does not use SQL for access. In other words, any non-relational database. This includes databases based on Amazon's Dynamo key-value store and Google's BigTable, in addition to document databases (usually in JSON format) and graph databases such as those found in social networks. Term coined in 2009.
object database Data collection that holds values and processing information. Must provide for inheritance, encapsulation, and polymorphism. Often used with multimedia applications, as objects can be audio, visual, and/or graphical data. Also called OODB (Object-Oriented DataBase), OODBMS (Object-Oriented DataBase Management System), ODBMS (Object DataBase Management System), and objectbase.
relational database Database structure. Data is stored in two-dimensional tables with rows and columns, and is accessed through SQL (Structured Query Language). Relational databases are the most common database used in IT today.
virtual database A virtual database allows users to access data in disparate databases, and perhaps even unstructured information stored in documents or e-mail messages from a single query. These databases would not require the data to be converted to a single format. Virtual databases are in their infancy, and IBM is the leader in this technology with DB2 Information Integrator providing this capability. IBM calls this database a federated database.
Back to top
1. I was surprised – it's social networking. I would have guessed texting (somehow I did know it wasn't making phone calls!), but several sources say it's social networking. Remember Twitter is considered social nwetworking.
2. BPMN (Business Process Modeling Notation) has become the basis for a lot of BPM (Business Process Modeling) systems. This is a notational system which provides a graphical notation for diagramming business processes. These diagrams are called BPDs (Business Process Diagrams) and can be used by both technical and business people to communicate business process information.
3. There are two big ones: RIM's BlackBerry PlayBook and Samsung's Galaxy Tab. Tablet computers are quickly growing in use and, just as laptops fundamentally replaced desktop systems, will they replace laptops?
4. New term! New term! New idea. A social browser is a regular browser which automatically incorporates social networking sites – specifically Twitter and Facebook. You can post status updates directly from the browser. Only one around isn't yet around – RockMelt is in Beta.
5. The Droid Pro and Droid 2 Global now work globally. You can use your smartphone in over 200 countries. Business people who travel globally will be jumping on these – both released in November.
Back to top
SemCo Enterprises, Inc. respects your privacy. We do not sell, rent or share your information with anyone.