OneStopTesting - Quality Testing Jobs, eBooks, Articles, FAQs, Training Institutes, Testing Software, Testing downloads, testing news, testing tools, learn testing, manual testing, automated testing, load runner, winrunner, test director, silk test, STLC

Forum| Contact Us| Testimonials| Sitemap| Employee Referrals| News| Articles| Feedback| Enquiry
 
Testing Resources
 
  • Testing Articles
  • Testing Books
  • Testing Certification
  • Testing FAQs
  • Testing Downloads
  • Testing Interview Questions
  • Career In Software Testing
  • Testing Jobs
  • Testing Job Consultants
  • Testing News
  • Testing Training Institutes
  •  
    Fundamentals
     
  • Introduction
  • Designing Test Cases
  • Developing Test Cases
  • Writing Test Cases
  • Test Case Templates
  • Purpose
  • What Is a Good Test Case?
  • Test Specifications
  • UML
  • Scenario Testing
  • Test Script
  • Test Summary Report
  • Test Data
  • Defect Tracking
  •  
    Software testing
     
  • Testing Forum
  • Introduction
  • Testing Start Process
  • Testing Stop Process
  • Testing Strategy
  • Risk Analysis
  • Software Listings
  • Test Metrics
  • Release Life Cycle
  • Interoperability Testing
  • Extreme Programming
  • Cyclomatic Complexity
  • Equivalence Partitioning
  • Error Guessing
  • Boundary Value Analysis
  • Traceability Matrix
  •  
    SDLC Models
     
  • Introduction
  • Waterfall Model
  • Iterative Model
  • V-Model
  • Spiral Model
  • Big Bang Model
  • RAD Model
  • Prototyping Model
  •  
    Software Testing Types
     
  • Static Testing
  • Dynamic Testing
  • Blackbox Testing
  • Whitebox Testing
  • Unit Testing
  • Requirements Testing
  • Regression Testing
  • Error Handling Testing
  • Manual support Testing
  • Intersystem Testing
  • Control Testing
  • Parallel Testing
  • Volume Testing
  • Stress Testing
  • Performance Testing
  • Agile Testing
  • Localization Testing
  • Globalization Testing
  • Internationalization Testing
  •  
    Test Plan
     
  • Introduction
  • Test Plan Development
  • Test Plan Template
  • Regional Differences
  • Criticism
  • Hardware Development
  • IEEE 829-1998
  • Testing Without a TestPlan
  •  
    Code Coverage
     
  • Introduction
  • Measures
  • Working
  • Statement Coverage
  • Branch Coverage
  • Path Coverage
  • Coverage criteria
  • Code coverage in practice
  • Tools
  • Features
  •  
    Quality Management
     
  • Introduction
  • Components
  • Capability Maturity Model
  • CMMI
  • Six Sigma
  •  
    Project Management
     
  • Introduction
  • PM Activities
  • Project Control Variables
  • PM Methodology
  • PM Phases
  • PM Templates
  • Agile PM
  •  
    Automated Testing Tools
     
  • Quick Test Professional
  • WinRunner
  • LoadRunner
  • Test Director
  • Silk Test
  • Test Partner
  • Rational Robot
  •  
    Performance Testing Tools
     
  • Apache JMeter
  • Rational Performance Tester
  • LoadRunner
  • NeoLoad
  • WAPT
  • WebLOAD
  • Loadster
  • OpenSTA
  • LoadUI
  • Appvance
  • Loadstorm
  • LoadImpact
  • QEngine
  • Httperf
  • CloudTest
  •  
    Languages
     
  • Perl Testing
  • Python Testing
  • JUnit Testing
  • Unix Shell Scripting
  •  
    Automation Framework
     
  • Introduction
  • Keyword-driven Testing
  • Data-driven Testing
  •  
    Configuration Management
     
  • History
  • What is CM?
  • Meaning of CM
  • Graphically Representation
  • Traditional CM
  • CM Activities
  • Tools
  •  
    Articles
     
  • What Is Software Testing?
  • Effective Defect Reports
  • Software Security
  • Tracking Defects
  • Bug Report
  • Web Testing
  • Exploratory Testing
  • Good Test Case
  • Write a Test
  • Code Coverage
  • WinRunner vs. QuickTest
  • Web Testing Tools
  • Automated Testing
  • Testing Estimation Process
  • Quality Assurance
  • The Interview Guide
  • Upgrade Path Testing
  • Priority and Severity of Bug
  • Three Questions About Bug
  •    
     
    Home » Testing Articles » Manual Testing Articles » Thoughts on Test Data

    Thoughts on Test Data

    A D V E R T I S E M E N T


    A number of RBCS clients find that obtaining good test data poses many
    challenges. For any large-scale system, testers usually cannot create sufficient
    and sufficiently diverse test data by hand; i.e., one record at a time. While datageneration
    tools exist and can create almost unlimited amounts of data, the data
    so generated often do not exhibit the same diversity and distribution of values as
    production data. For these reasons, many of our clients consider production data
    ideal for testing, particularly for systems where large sets of records have
    accumulated over years of use with various revisions of the systems currently in
    use, and systems previously in use.
    However, to use production data, we must preserve privacy. Production data
    often contains personal data about individuals which must be handled securely.
    However, requiring secure data handling during testing activities imposes
    undesirable inefficiencies and constraints. Therefore, many organizations want to
    anonymize (scramble) the production data prior to using it for testing.
    This anonymization process leads to the next set of challenges, though. The
    anonymization process must occur securely, in the sense that it is not reversible
    should the data fall into the wrong hands. For example, simply substituting the
    next digit or the next letter in sequence would be obvious to anyone�it doesn't
    take long to deduce that "Kpio Cspxo" is actually "John Brown"�which makes
    the de-anonymization process trivial.
    In addition, Kpio Cspxo and other similar nonsense scrambles make poor test
    data, because they are not realistic. The anonymization process must preserve the
    usefulness of the data for localization and functional testing, which often
    involves preserving its meaning and meaningfulness. For example, if the
    anonymization process changes "John Brown" to "Lester Camden," we still have
    a male name, entirely usable for functional testing. If it changes "John Brown" to
    "Charlotte Dostoyevsky," though, it has imposed a gender change on John, and
    if his logical record includes a gender field, we have now damaged the data.
    Preserving the meaning of the data has another important implication. It must be
    possible to construct queries, views, and joins of these anonymized data that
    correspond directly to queries, views, and joins of the production data. For
    example, if a query for all records with the first name "John" and the last name
    "Brown" returned 20 records against production data, a query for all records
    with the first name "Lester" and the last name "Camden" must return 20 records
    against anonymized data. Failure to honor this corollary of the meaning and
    meaningfulness requirement can result in major problems when using the data
    Thoughts on Test Data 2 Copyright � 2010, RBCS, All Rights Reserved
    for some types of functional tests, as well as any kind of performance, reliability,
    or load test.
    Even more challenging is the matter of usefulness of the data for interoperability
    testing. Consider three applications, each of which have data gathered over years
    and describing the same population. The data reside in three different databases.
    The three applications interoperate, sharing data, and data-warehousing and
    analytical applications can access the related data across databases. These
    applications can create a logical record for a single person through a de facto join
    via de facto foreign keys, such as full name, Social Security number, and so forth.
    If the anonymization process scrambles the data in such a way that these
    integrity constraints break, then the usefulness of the anonymized data for
    interoperability testing breaks. Meaningful end-to-end testing of functionality,
    performance, throughput, reliability, localization, and security becomes
    impossible in this situation. For many of our clients, preserving the usefulness of
    the data for interoperability testing poses the hardest challenge.
    In addition, the anonymization process must not change the overall data quality
    of the scrambled data. This is subtle, because most production data contains a
    large number of errors. Some have estimated the error rate as high as one in four
    records. So, to preserve the fidelity of the test data with respect to production
    data, the same records that have errors must continue to contain errors. These
    errors must be similar to the original errors, but must not allow reverse
    engineering of the original errors.
    A good test-data set has the property of maintainability, and so the anonymized
    data must also. Maintainability of test data means the ability to edit, add, and
    delete the data. This includes at the level of individual data fields and records,
    and, if applicable, across the logical records that might span multiple databases.
    To have the property of maintainability, the anonymization of the production
    data should not make maintenance of the data impossible, of course, but
    furthermore it should not make maintenance of the data any more difficult or
    time-consuming than maintenance of the production data.
    Two other practical challenges arise with the process of anonymization itself. The
    first is the time and effort required to carry out the anonymization process. One
    client told an RBCS consultant that they only refreshed their test data from
    production every 12 to 18 months, because the test-data refresh process,
    including the anonymization, required 4 to 6 person-months of effort and
    typically took an entire month to complete. In an organization where staff must
    charge the time spent on tasks to a particular project, few project managers felt
    compelled to absorb such a cost into their budgets.
    The next practical challenge of anonymization relates to the need to operate on
    quiescent data. In other words, the data cannot change during extraction of the
    to-be-anonymized data. This is nothing more complex than the usual challenge
    Thoughts on Test Data 3 Copyright � 2010, RBCS, All Rights Reserved
    of backing up databases, but the people involved in producing the anonymized
    test data must be aware of it.
    Options for production-data anonymization include both commercial and
    custom-developed tools. The selection of a data anonymization tool is like the
    selection of any other test tool. One must assemble a team, determine the tool
    options, identify risks and constraints for the project, evaluate and select the tool,
    and then roll out the tool. In this case, these activities would typically happen in
    the context of a larger project focused on creating test data entirely or in part
    through the anonymization of production data. Our experience with RBCS
    clients has shown that such a project requires careful planning, including
    identification of all requirements for the anonymized data and the
    anonymization process. An organization planning such a project should
    anticipate investing a substantial amount of time and perhaps even money
    (should commercial tools prove desirable). Trying to do a production-data
    anonymization project on the cheap is likely to result in failure to overcome
    many of the challenges discussed here. However, with careful planning and
    execution, it is possible for an organization to use anonymized production data
    for testing purposes.



    More Manual Testing Articles



    discussionDiscussion Center
    Discuss
    Discuss

    Query

    Feedback
    Yahoo Groups
    Y! Group
    Sirfdosti Groups
    Sirfdosti
    Contact Us
    Contact

    Looking for Software Testing eBooks and Interview Questions? Join now and get it FREE!
     
    A D V E R T I S E M E N T
       
       

    Members Login


    Email ID:
    Password:


    Forgot Password
    New User
       
       
    Testing Interview Questions
  • General Testing
  • Automation Testing
  • Manual Testing
  • Software Development Life Cycle
  • Software Testing Life Cycle
  • Testing Models
  • Automated Testing Tools
  • Silk Test
  • Win Runner
  •    
       
    Testing Highlights

  • Software Testing Ebooks
  • Testing Jobs
  • Testing Frequently Asked Questions
  • Testing News
  • Testing Interview Questions
  • Testing Jobs
  • Testing Companies
  • Testing Job Consultants
  • ISTQB Certification Questions
  •    
       
    Interview Questions

  • WinRunner
  • LoadRunner
  • SilkTest
  • TestDirector
  • General Testing Questions
  •    
       
    Resources

  • Testing Forum
  • Downloads
  • E-Books
  • Testing Jobs
  • Testing Interview Questions
  • Testing Tools Questions
  • Testing Jobs
  • A-Z Knowledge
  •    
    Planning
    for
    Study ABROAD ?


    Study Abroad


    Vyom Network : Free SMS, GRE, GMAT, MBA | Online Exams | Freshers Jobs | Software Downloads | Programming & Source Codes | Free eBooks | Job Interview Questions | Free Tutorials | Jokes, Songs, Fun | Free Classifieds | Free Recipes | Bangalore Info | GATE Preparation | MBA Preparation | Free SAP Training
    Privacy Policy | Terms and Conditions
    Sitemap | Sitemap (XML)
    Job Interview Questions | Placement Papers | SMS Jokes | C++ Interview Questions | C Interview Questions | Web Hosting
    German | French | Portugese | Italian