AI Archives | IT Business Edge Thu, 25 Aug 2022 22:45:31 +0000 en-US hourly 1 https://wordpress.org/?v=6.5.5 Enterprise Software Startups: What It Takes To Get VC Funding https://www.itbusinessedge.com/applications/what-it-takes-to-get-vc-funding/ Thu, 25 Aug 2022 22:45:29 +0000 https://www.itbusinessedge.com/?p=140708 While financial markets have rallied in recent weeks, there are still many enterprise software companies that are trading at depressed levels. It’s common for there to be losses of 50%+ for the past year. Just a few include Okta, Twilio, and DocuSign. This has also put tremendous pressure on funding for startups. During the second […]

The post Enterprise Software Startups: What It Takes To Get VC Funding appeared first on IT Business Edge.

]]>
While financial markets have rallied in recent weeks, there are still many enterprise software companies that are trading at depressed levels. It’s common for there to be losses of 50%+ for the past year. Just a few include Okta, Twilio, and DocuSign.

This has also put tremendous pressure on funding for startups. During the second quarter, venture capitalists (VCs) struck 24% fewer deals on a quarter-over-quarter basis, according to PitchBook. And the IPO market is having its worst year in a decade, further hurting startup funding.

“VCs are definitely getting more selective,” said Muddu Sudhakar, the CEO and founder of Aisera. “The bar is much higher now.”

As for his own firm, Sudhakar was able to raise $90 million in a Series D round. The lead was Goldman Sachs and other investors included True Ventures, Zoom, and Khosla Ventures.

It helped that Aisera has a unique platform that leverages predictive AI for managing customer service, IT and sales. The technology has shown to be effective in lowering operating costs.

Also read: 5 Top VCs For Data Startups

Getting Funded in a Down Market

So what are some other enterprise software startups that have been able to buck today’s tough environment? What are the factors for success in current markets?

Let’s take a look at a few success stories.

CleverTap: AI-based User Engagement

“The best way to attract investors is to build a growing and sustainable business,” said Sunil Thomas, co-founder and executive chairman of CleverTap. “Focus on unit economics, growth, cash efficiency, and profitability.”

The strategy has worked out quite well for him. In August, CleverTap announced a Series D funding for $105 million. The lead on the deal was CDPQ, which wrote a check for $75 million. Other investors were Tiger Global and Sequoia India.

CleverTap software leverages artificial intelligence (AI) and machine learning (ML) to engage and retain users. Since the launch six years ago, the company has amassed a customer base of 1,200 brands.

“The overall funding environment has gone back to basics,” said Thomas. “Funding is definitely available for great ideas — at the early stages — and sustainable businesses at the growth stage.”

See the Top Artificial Intelligence (AI) Software for 2022

airSlate: Document Automation

airSlate raised $51.5 million in June. The lead investors were G Squared and UiPath. The valuation of the round came to $1.25 billion.

Founded in 2008, airSlate has created an automation platform that allows for e-signatures, PDF editing, document management and workflow solutions. There are over 100 million users.

“So what attracts investors?” said Borya Shakhnovich, CEO of airSlate. “Put simply, financials that speak for themselves. This means breaking even early on in the company’s journey, procuring impressive revenue figures, and demonstrating growth of the customer base.

“Touting solid financials for venture capital interest might sound painstakingly intuitive, but it’s not always that simple,” Shakhnovich added. “I often liken investors to shoes — there’s a lot of them to choose from, and some will fit better than others. A lot of founders feel like their purpose is to win every investor, but that’s not always possible. Many investors demand brand recognition and a firm customer base over financial stability. The best approach is to stand by your organization’s strength and identify like-minded investors.”

Also read: Top RPA Tools 2022: Robotic Process Automation Software

Tropic: Procurement Analytics

Earlier in the year, Tropic raised $40 million in a Series A round that Insight Partners led. The company’s software allows for better procurement. Keep in mind that the average company overpays by 30% for software.

Some of the customers are Vimeo, Zapier and Qualtrics. The company manages over $300 million in spend.

“At Tropic, we have a unique vantage point in that we can see how businesses are truly performing based on the purchasing behaviors of hundreds of companies,” said Dave Campbell, CEO and co-founder of Tropic. “We power these purchases, which gives us line of sight into who is performing well, who is churning, and who is struggling to get traction.”

Campbell points out the following learnings for those companies getting funding:

  • They offer something that thrives in a downturn like cost-cutting and efficiency-improving approaches.
  • They emphasize retention over growth. Companies raising now are in the 120% NRR (Net Revenue Retention) range, even if they are only growing 50% year-over-year. 300% growth with 50% NRR won’t attract investors.
  • They have strong efficiency. Sales efficiency of over 1 and CAC (Customer Acquisition Cost) payback of less than 12 months.
  • They power a mission-critical service. Nice-to-haves are out.
  • They are willing to discount their valuation.

Lightning AI

In June, Lightning AI announced a Series B funding of $40 million. The lead was Coatue and other investors included Index, Bain, First Minute Capital, and the Chainsmokers’ Mantis VC.

The company has an open source platform to build AI models. It has been downloaded more than 22 million times since 2019 and used by 10,000 organizations across the globe.

“These latest changes in the funding environment have made it more important than ever for businesses to make it explicitly clear how they create value for their users and customers,” said William Falcon, CEO and co-founder of Lightning AI. “We expect to see an increasing amount of focus placed on the ability to synthesize what a business does into clear and well-articulated value propositions and a larger focus on efficient growth backed by strong unit economics.”

Falcon stresses that founders need to find investors that align with the vision of the company. True, in a rough funding environment, it can be difficult to say “no” to an offer of millions of dollars. But for the long-term prospects, this may be the right choice.

“While there’s no shortage of MLOps products today, it was important to us from the beginning that we found investors who understood that Lightning AI is not building simply another machine learning platform, we are building the foundational platform that will unite the machine learning space,” said Falcon.

Read next:

The post Enterprise Software Startups: What It Takes To Get VC Funding appeared first on IT Business Edge.

]]>
Data Lake Governance & Security Issues https://www.itbusinessedge.com/security/data-lake-governance-security-issues/ Thu, 18 Aug 2022 19:30:07 +0000 https://www.itbusinessedge.com/?p=140697 Analysis of data fed into data lakes promises to provide enormous insights for data scientists, business managers, and artificial intelligence (AI) algorithms. However, governance and security managers must also ensure that the data lake conforms to the same data protection and monitoring requirements as any other part of the enterprise. To enable data protection, data […]

The post Data Lake Governance & Security Issues appeared first on IT Business Edge.

]]>
Analysis of data fed into data lakes promises to provide enormous insights for data scientists, business managers, and artificial intelligence (AI) algorithms. However, governance and security managers must also ensure that the data lake conforms to the same data protection and monitoring requirements as any other part of the enterprise.

To enable data protection, data security teams must ensure only the right people can access the right data and only for the right purpose. To help the data security team with implementation, the data governance team must define what “right” is for each context. For an application with the size, complexity and importance of a data lake, getting data protection right is a critically important challenge.

See the Top Data Lake Solutions

From Policies to Processes

Before an enterprise can worry about data lake technology specifics, the governance and security teams need to review the current policies for the company. The various policies regarding overarching principles such as access, network security, and data storage will provide basic principles that executives will expect to be applied to every technology within the organization, including data lakes.

Some changes to existing policies may need to be proposed to accommodate the data lake technology, but the policy guardrails are there for a reason — to protect the organization against lawsuits, breaking laws, and risk. With the overarching requirements in hand, the teams can turn to the practical considerations regarding the implementation of those requirements.

Data Lake Visibility

The first requirement to tackle for security or governance is visibility. In order to develop any control or prove control is properly configured, the organization must clearly identify:

  • What is the data in the data lake?
  • Who is accessing the data lake?
  • What data is being accessed by who?
  • What is being done with the data once accessed?

Different data lakes provide these answers using different technologies, but the technology can generally be classified as data classification and activity monitoring/logging.

Data classification

Data classification determines the value and inherent risk of the data to an organization. The classification determines what access might be permitted, what security controls should be applied, and what levels of alerts may need to be implemented.

The desired categories will be based upon criteria established by data governance, such as:

  • Data Source: Internal data, partner data, public data, and others
  • Regulated Data: Privacy data, credit card information, health information, etc.
  • Department Data: Financial data, HR records, marketing data, etc.
  • Data Feed Source: Security camera videos, pump flow data, etc.

The visibility into these classifications depends entirely upon the ability to inspect and analyze the data. Some data lake tools offer built-in features or additional tools that can be licensed to enhance the classification capabilities such as:

  • Amazon Web Services (AWS): AWS offers Amazon Macie as a separately enabled tool to scan for sensitive data in a repository.
  • Azure: Customers use built-in features of the Azure SQL Database, Azure Managed Instance, and Azure Synapse Analytics to assign categories, and they can license Microsoft Purview to scan for sensitive data in the dataset such as European passport numbers, U.S. social security numbers, and more.
  • Databricks: Customers can use built-in features to search and modify data (compute fees may apply). 
  • Snowflake: Customers use inherent features that include some data classification capabilities to locate sensitive data (compute fees may apply).

For sensitive data or internal designations not supported by features and add-on programs, the governance and security teams may need to work with the data scientists to develop searches. Once the data has been classified, the teams will then need to determine what should happen with that data.

For example, Databricks recommends deleting personal information from the European Union (EU) that falls under the General Data Protection Regulation (GDPR). This policy would avoid future expensive compliance issues with the EU’s “right to be forgotten” that would require a search and deletion of consumer data upon each request.

Other common examples for data treatment include:

  • Data accessible for registered partners (customers, vendors, etc.)
  • Data only accessible by internal teams (employees, consultants, etc.)
  • Data restricted to certain groups (finance, research, HR, etc.)
  • Regulated data available as read-only
  • Important archival data, with no write-access permitted

The sheer size of data in a data lake can complicate categorization. Initially, data may need to be categorized by input, and teams need to make best guesses about the content until the content can be analyzed by other tools.

In all cases, once data governance has determined how the data should be handled, a policy should be drafted that the security team can reference. The security team will develop controls that enforce the written policy and develop tests and reports that verify that those controls are properly implemented.

See the Top Governance, Risk and Compliance (GRC) Tools

Activity monitoring and logging

The logs and reports provided by the data lake tools provide the visibility needed to test and report on data access within a data lake. This monitoring or logging of activity within the data lake provides the key components to verify effective data controls and ensure no inappropriate access is occuring.

As with data inspection, the tools will have various built-in features, but additional licenses or third-party tools may need to be purchased to monitor the necessary spectrum of access. For example:

  • AWS: AWS Cloudtrail provides a separately enabled tool to track user activity and events, and AWS CloudWatch collects logs, metrics, and events from AWS resources and applications for analysis.
  • Azure: Diagnostic logs can be enabled to monitor API (application programming interface) requests and API activity within the data lake. Logs can be stored within the account, sent to log analytics, or streamed to an event hub. And other activities can be tracked through other tools such as Azure Active Directory (access logs).
  • Google: Google Cloud DLP detects different international PII (personal identifiable information) schemes.
  • Databricks: Customers can enable logs and direct the logs to storage buckets.
  • Snowflake: Customers can execute queries to audit specific user activity.

Data governance and security managers must keep in mind that data lakes are huge and that the access reports associated with the data lakes will be correspondingly immense. Storing the records for all API requests and all activity within the cloud may be burdensome and expensive.

To detect unauthorized usage will require granular controls, so inappropriate access attempts can generate meaningful alerts, actionable information, and limited information. The definitions of meaningful, actionable, and limited will vary based upon the capabilities of the team or the software used to analyze the logs and must be honestly assessed by the security and data governance teams.

Data Lake Controls

Useful data lakes will become huge repositories for data accessed by many users and applications. Good security will begin with strong, granular controls for authorization, data transfers, and data storage.

Where possible, automated security processes should be enabled to permit rapid response and consistent controls applied to the entire data lake.

Authorization

Authorization in data lakes works similar to any other IT infrastructure. IT or security managers assign users to groups, groups can be assigned to projects or companies, and each of these users, groups, projects, or companies can be assigned to resources.

In fact, many of these tools will link to existing user control databases such as Active Directory, so existing security profiles may be extended to the data link. Data governance and data security teams will need to create an association between various categorized resources within the data lake with specific groups such as:

  • Raw research data associated with the research user group
  • Basic financial data and budgeting resources associated with the company’s internal users
  • Marketing research, product test data, and initial customer feedback data associated with the specific new product project group

Most tools will also offer additional security controls such as security assertion markup language (SAML) or multi-factor authentication (MFA). The more valuable the data, the more important it will be for security teams to require the use of these features to access the data lake data.

In addition to the classic authorization processes, the data managers of a data lake also need to determine the appropriate authorization to provide to API connections with data lakehouse software and data analysis software and for various other third-party applications connected to the data lake.

Each data lake will have their own way to manage the APIs and authentication processes. Data governance and data security managers need to clearly outline the high-level rules and allow the data security teams to implement them.

As a best practice, many data lake vendors recommend setting up the data to deny access by default to force data governance managers to specifically grant access. Additionally, the implemented rules should be verified through testing and monitoring through the records.

Data transfers

A giant repository of valuable data only becomes useful when it can be tapped for information and insight. To do so, the data or query responses must be pulled from the data lake and sent to the data lakehouse, third-party tool, or other resource.

These data transfers must be secure and controlled by the security team. The most basic security measure requires all traffic to be encrypted by default, but some tools will allow for additional network controls such as:

  • Limit connection access to specific IP addresses, IP ranges, or subnets
  • Private endpoints
  • Specific networks
  • API gateways
  • Specified network routing and virtual network integration
  • Designated tools (Lakehouse application, etc.)

Data storage

IT security teams often use the best practices for cloud storage as a starting point for storing data in data lakes. This makes perfect sense since the data lake will likely also be stored within the basic cloud storage on cloud platforms.

When setting up data lakes, vendors recommend setting the data lakes to be private and anonymous to prevent casual discovery. The data will also typically be encrypted at rest by default.

Some cloud vendors will offer additional options such as classified storage or immutable storage that provides additional security for stored data. When and how to use these and other cloud strategies will depend upon the needs of the organization.

See the Top Big Data Storage Tools

Developing Secure and Accessible Data Storage

Data lakes provide enormous value by providing a single repository for all enterprise data. Of course, this also paints an enormous target on the data lake for attackers that might want access to that data!

Basic data governance and security principles should be implemented first as written policies that can be approved and verified by the non-technical teams in the organization (legal, executives, etc.). Then, it will be up to data governance to define the rules and data security teams to implement the controls to enforce those rules.

Next, each security control will need to be continuously tested and verified to confirm that the control is working. This is a cyclical, and sometimes even a continuous, process that needs to be updated and optimized regularly.

While it’s certainly important to want the data to be safe, businesses also need to make sure the data remains accessible, so they don’t lose the utility of the data lake. By following these high-level processes, security and data lake experts can help ensure the details align with the principles.

Read next: Data Lake Strategy Options: From Self-Service to Full-Service

The post Data Lake Governance & Security Issues appeared first on IT Business Edge.

]]>
Metaverse’s Biggest Potential Is In Enterprises https://www.itbusinessedge.com/applications/metaverse-enterprise-potential/ Thu, 18 Aug 2022 14:12:02 +0000 https://www.itbusinessedge.com/?p=140695 Last year, Match Group – which operates online platforms like Tinder, Match.com and Hinge – shelled out $1.725 billion to acquire Hyperconnect. This was a play on the metaverse. Unfortunately, Match had challenges in making the strategy work. In the latest shareholder letter, Match CEO Bernard Kim noted: “I’ve instructed the Hyperconnect team to iterate […]

The post Metaverse’s Biggest Potential Is In Enterprises appeared first on IT Business Edge.

]]>
Last year, Match Group – which operates online platforms like Tinder, Match.com and Hinge – shelled out $1.725 billion to acquire Hyperconnect. This was a play on the metaverse.

Unfortunately, Match had challenges in making the strategy work. In the latest shareholder letter, Match CEO Bernard Kim noted: “I’ve instructed the Hyperconnect team to iterate but not invest heavily in metaverse at this time. We’ll continue to evaluate this space carefully, and we will consider moving forward at the appropriate time when we have more clarity on the overall opportunity and feel we have a service that is well-positioned to succeed.”

This is not a one-off. Even the giant Meta has had its own problems. This is despite the company’s enormous resources and global user base. During the past year, the stock price has plunged from $378 to $178.

The metaverse clearly faces some challenges.

See also: How Revolutionary Are Meta’s AI Efforts?

First-mover Opportunities in the Enterprise?

The metaverse’s early challenges should come as no surprise. It’s never easy to launch new technologies.

But for the metaverse, the first-mover opportunities may actually not be in the consumer space.  Surprise: They may emerge in the enterprise.

“Businesses today are already leveraging the metaverse to drive new interactions,” said Matt Barrington, Principal of Digital & Emerging Technologies, EY. “These experiences are driven by both existing technology stacks and Web 3.0 technology stacks, bringing in new business models and ways to create, store, and exchange value. We are seeing mass experimentation across the market as companies explore business-relevant use cases and assess the impact of the metaverse on their business and customers.”

So let’s take a deeper look at the enterprise opportunities in the new virtual world of the metaverse.

Metaverse Applications

When it comes to the consumer metaverse, the types of use cases are limited. It’s really about gaming-type experiences. In terms of monetization, there is the purchase of digital items, subscriptions, and sponsorships. Interestingly enough, there are various brands that have purchased virtual real estate on the metaverse.

But as for the enterprise, there are seemingly endless applications. In fact, each industry can have its own set of metaverses.

“Consistent with the findings of our recent Metaverse surveys, using metaverse environments for purposes of delivering new experiences to the workforce for training, onboarding or recruiting are immediate use cases,” said Emmanuelle Rivet, Vice Chair, U.S. TMT and Global Technology Leader, PwC. “In addition, metaverse environments provide a place for connecting and engaging with a dispersed workforce including front line workers who may feel detached from the ‘center’ or ‘corporate.’ This is interesting but it also provides the opportunity for employees to be exposed to the metaverse, get familiar with it and effectively be up-skilled by experimentation, providing a platform for innovation and development of more use cases for companies.”

There are also interesting use cases with digital twins of physical environments that can be made hyper realistic and physically accurate.

“The physical environment to be replicated may be natural, or it may be something that was constructed, such as a building or other type of structure, an industrial operation, or a transportation network,” said Andrew Blau, Managing Director, U.S. Leader, Eminence & Insights, Deloitte Consulting. “Humans, robots, and AI agents can work together inside these digital twins to plan, design, and test—accelerating innovation and planning cycles for a variety of business needs.”

Also read: The Metaverse: Catching the Next Internet-Like Wave

Metaverse Strategies

The playbook for the metaverse is still in the early stages. Mistakes will be inevitable. But there are some guidelines that will help.

“Employees and customers are both looking for new experiences in the metaverse – and that means ensuring that virtual avatars, augmented reality and other forms of interaction are user-friendly enough to make collaboration and training simpler than it is in real life,” said Adrian McDermott, CTO of Zendesk.  “You need to prioritize immersion.”

And yes, there will need to be much due diligence of the tech stacks. They can be expensive and complicated.

“Firms need trusted technology partners that build, or vet and collaborate with, the best-in-class technology, as well as the means to plan, deploy and manage the technology so solutions that accelerate business today don’t become a roadblock tomorrow,” said Vishal Shah, General Manager of XR and Metaverse, Lenovo. “This also requires an open solution to always make the best hardware and software components for the use cases. … The fact is ‘Open’ always wins and will again in this new world.”

Another part of the strategy – which can easily be overlooked – is finance transformation.  Without this, the chances of success decline precipitously.

“Organizations will need to develop completely different approaches to finance, accounting, risk and compliance processes to sustain all of the major innovations coming with the metaverse, including monetization and metaverse economy innovations such as crypto currency and NFTs,” said Brajesh Jha, SVP & Global Head of Media, Publishing and Entertainment, Genpact.

Don’t Get Left Behind

The temptation for enterprises, though, is to take a wait-and-see approach with the metaverse.  But this could mean falling behind competitors. And it may be extremely tough to catch up.

“The metaverse presents a significant opportunity for business,” said Mike Storiale, VP, Innovation Development, Synchrony. “This is potentially a new dimension of commerce that we haven’t seen since the late 1990s with e-commerce.”

Read next: The Value of the Metaverse for Small Businesses

The post Metaverse’s Biggest Potential Is In Enterprises appeared first on IT Business Edge.

]]>
Data Lake Strategy Options: From Self-Service to Full-Service https://www.itbusinessedge.com/business-intelligence/data-lake-strategy/ Mon, 08 Aug 2022 14:21:00 +0000 https://www.itbusinessedge.com/?p=140682 Data continues to grow in importance for customer insights, projecting trends, and training artificial intelligence (AI) or machine learning (ML) algorithms. In a quest to fully encompass all data sources, data researchers maximize the scale and scope of data available by dumping all corporate data into one location. On the other hand, having all that […]

The post Data Lake Strategy Options: From Self-Service to Full-Service appeared first on IT Business Edge.

]]>
Data continues to grow in importance for customer insights, projecting trends, and training artificial intelligence (AI) or machine learning (ML) algorithms. In a quest to fully encompass all data sources, data researchers maximize the scale and scope of data available by dumping all corporate data into one location.

On the other hand, having all that critical data in one place can be an attractive target for hackers, who continuously probe defenses looking for weaknesses, and the penalties for data breaches can be enormous. IT security teams need a system that allows for security to differentiate between different categories of data to isolate and secure it against misuse.

Data lakes provide the current solution to maximizing data availability and protection. For large enterprises, their data managers and data security teams can choose from many different data lake vendors to suit their needs.

However, while anyone can create a data lake, not everyone will have the resources to achieve scale, extract value, and protect their resources on their own. Fortunately, vendors offer robust tools that permit smaller teams to obtain the benefits of a data lake without requiring the same resources to manage them.

See the Top Data Lake Solutions

What are Data Lakes?

Data lakes create a single repository for an organization’s raw data. Data feeds bring in data from databases, SaaS platforms, web crawlers, and even edge devices such as security cameras or industrial heat pumps.

Similar to a giant hard drive, data lakes also can incorporate folder structures and apply security to specific folders to limit access, read/write privileges, and deletion privileges to users and applications. However, unlike a hard drive, data lakes should be able to grow in size forever and never require a deletion of data because of space restrictions.

Data lakes support all data types, scale automatically, and support a wide range of analytics, from built-in features to external tools supported by APIs. Analytic tools can perform metadata or content searches or categorize data without changing the underlying data itself.

Self-service Data Lake Tools

Technically, if a company can fit all of its data onto a single hard drive, that is the equivalent of a data lake. However, most organizations have astronomically more data than that, and large enterprises need huge repositories.

Some organizations create their own data lakes in their own data centers. This endeavor requires much more investment in:

  • Capital expense: buildings, hardware, software, access control systems
  • Operational expense: electrical power, cooling systems, high-capacity internet/network connections, maintenance and repair costs
  • Labor expense: IT and IT security employees to maintain the hardware, physical security

Vendors in this category provide tools needed for a team to create their own data lake. Organizations choosing these options will need to supply more time, expenses, and expertise to build, integrate, and secure their data lakes.

Apache: Hadoop & Spark

The Apache open-source projects provide the basis for many cloud computing tools. To create a data lake, an organization could combine Hadoop and Spark to create the base infrastructure and then consider related projects or third-party tools in the ecosystem to build out capabilities.

Apache Hadoop provides scalable distributed processing of large data sets with unstructured or structured data content. Hadoop provides the storage solution and basic search and analysis tools for data.

Apache Spark provides a scalable open-source engine that batches data, streams data, performs SQL analytics, trains machine learning algorithms, and performs exploratory data analysis (EDA) on huge data sets. Apache Spark provides deep analysis tools for more sophisticated examinations of the data than available in the basic Hadoop deployment.

Hewlett Packard Enterprise (HPE) GreenLake

The HPE GreenLake service provides pre-integrated hardware and software that can be deployed in internal data centers or in colocation facilities. HPE handles the heavy lifting for the deployment and charges clients based upon their usage.

HPE will monitor usage and scale the deployment of the Hadoop data lake based upon need and provide support for design and deployment of other applications. This service turbo-charges a typical internal-deployment of Hadoop by outsourcing some of the labor and expertise to HPE.

Cloud Data Lake Tools

Cloud data lake tools provide the infrastructure and the basic tools needed to provide a turn-key data lake. Customers use built-in tools to attach data feeds, storage, security, and APIs to access and explore the data.

After selecting options, some software packages will already be integrated into the data lake upon launch. When a customer selects a cloud option, it will immediately be ready to intake data and will not need to wait for shipping, hardware installation, software installation, etc.

However, in an attempt to maximize the customizability of the data lake, these tools tend to push more responsibility to the customer. Connecting data feeds, external data analytics, or applying security will be more manual a process than compared with full-service solutions.

Some data lake vendors provide data lakehouse tools to attach to the data lake and provide an interface for data analysis and transfer. There may also be other add-on tools available that provide the features available in full-service solutions.

Customers can choose either the bare-bones data lake and then do more heavy lifting or pay extra for features that create the more full-service version. These vendors also tend not to encourage multi-cloud development and focus on driving more business towards their own cloud platforms.

Amazon Web Services (AWS) Data Lake

AWS provides enormous options for cloud infrastructure. Their data lake offering provides an automatically-configured collection of core AWS services to store and process raw data.

Incorporated tools permit users or apps to analyze, govern, search, share, tag, and transform subsets of data internally or with external users. Federated templates integrate with Microsoft Active Directory to incorporate existing data segregation rules already deployed internally within a company.

Google Cloud

Google offers data lake solutions that can house an entire data lake or simply help process a data lake workload from an external source (typically internal data centers). Google Cloud claims that moving from an on-premises Hadoop deployment to a Google Cloud-hosted deployment can lower costs by 54%.

Google offers its own BigQuery analytics that captures data in real-time using a streaming ingestion feature. Google supports Apache Spark and Hadoop migration, integrated data science and analytics, and cost management tools.

Microsoft Azure

Microsoft’s Azure Data Lake solution deploys Apache Spark and Apache Hadoop as fully-managed cloud offerings as well as other analytic clusters such as Hive, Storm, and Kafka. Azure data lake includes Microsoft solutions for enterprise-grade security, auditing, and support.

Azure Data Lake integrates easily with other Microsoft products or existing IT infrastructure and is fully scalable. Customers can define and launch a data lake very quickly and use their familiarity with other Microsoft products to intuitively navigate through options.

See the Top Big Data Storage Tools

Full-service Data Lake Tools

Full-service data lake vendors add layers of security, user-friendly GUIs, and constrain some features in favor of ease-of-use. These vendors may provide additional analysis features built into their offerings to provide additional value.

Some companies cannot or strategically choose not to store all of their data with a single cloud provider. Other data managers may simply want a flexible platform or might be trying to stitch together data resources from acquired subsidiaries that used different cloud vendors.

Most of the vendors in this category do not offer data hosting and act as agnostic data managers and promote using multi-cloud data lakes. However, some of these vendors offer their own cloud solutions and offer a fully integrated full-service offering that can access multiple clouds or transition the data to their fully-controlled platform.

Cloudera Cloud Platform

Cloudera’s Data Platform provides a unifying software to ingest and manage a data lake potentially spread across public and private cloud resources. Cloudera optimizes workloads based on analytics and machine learning as well as provides integrated interfaces to secure and govern platform data and metadata with integrated interfaces.

Cohesity

Cohesity’s Helios platform offers a unified platform that provides data lake and analysis capabilities. The platform may be licensed as a SaaS solution, as software for self-hosted data lakes, or for partner-managed data lakes.

Databricks

Databricks provides data lake house and data lake solutions built on open source technology with integrated security and data governance. Customers can explore data, build models collaboratively, and access preconfigured ML environments. Databricks works across multiple cloud vendors and manages the data repositories through a consolidated interface.

Domo

Domo provides a platform that enables a full range of data lake solutions from storage to application development. Domo augments existing data lakes or customers can host data on the Domo cloud.

IBM

IBM cloud-based data lakes can be deployed on any cloud and builds governance, integration, and virtualization into the core principles of their solution. IBM data lakes can access IBM’s pioneering Watson AI for analysis as well as access many other IBM tools for queries, scalability, and more.

Oracle

Oracle’s Big Data Service deploys a private version of Cloudera’s cloud platform and integration with their own Data Lakehouse solution and the Oracle cloud platform. Oracle builds on their mastery of database technology to provide strong tools for data queries, data management, security, governance, and AI development.

Snowflake

Snowflake provides a full service data lake solution that can integrate storage and computing solutions from AWS, Microsoft, or Google. Data managers do not need to know how to set up, maintain, or support servers and networks and therefore can use Snowflake without previously establishing any cloud databases.

Also read: Snowflake vs. Databricks: Big Data Platform Comparison

Choosing a Data Lake Strategy and Architecture

Data analytics continues to rise in importance as companies find more uses for wider varieties of data. Data lakes provide an option to store, manage, and analyze all data sources for an organization even as they try to figure out what is important and useful.

This article provides an overview of different strategies to deploy data lakes and different technologies available. The list of vendors is not comprehensive and new competitors are constantly entering the market.

Don’t start by selecting a vendor. First start with an understanding of company resources available to support a data lake.

If the available resources are small, the company will likely need to pursue a full-service option over an in-house data center. However, many other important characteristics play a role in determining the optimal vendor, such as:

  • Business use case
  • AI compatibility
  • Searchability
  • Compatibility with data lakehouse or other data searching tools
  • Security
  • Data governance

Once established, data lakes can be moved, but this could be a very expensive proposition since most data lakes will be enormous. Organizations should take their time and try test runs on a smaller scale before they commit fully to a single vendor or platform.

Read next: 10 Top Data Companies

The post Data Lake Strategy Options: From Self-Service to Full-Service appeared first on IT Business Edge.

]]>
What’s New With Google Vertex AI? https://www.itbusinessedge.com/business-intelligence/google-vertex-ai/ Tue, 26 Jul 2022 15:00:00 +0000 https://www.itbusinessedge.com/?p=140674 Sundar Pichai introduced Vertex AI to the world during the Google I/O 2021 conference last year, placing it against managed AI platforms such as Amazon Web Services (AWS) and Azure in the global AI market. The Alphabet CEO once said, “Machine learning is a core, transformative way by which we’re rethinking how we’re doing everything.” […]

The post What’s New With Google Vertex AI? appeared first on IT Business Edge.

]]>
Sundar Pichai introduced Vertex AI to the world during the Google I/O 2021 conference last year, placing it against managed AI platforms such as Amazon Web Services (AWS) and Azure in the global AI market.

The Alphabet CEO once said, “Machine learning is a core, transformative way by which we’re rethinking how we’re doing everything.”

A November 2020 study by Gartner predicted a near-20% growth rate for managed services like Vertex AI. Gartner said that as enterprises invest more in mobility and remote collaboration technologies and infrastructure, growth in the public cloud industry will be sustained through 2024.

Vertex AI replaces legacy services like AI Platform Training and Prediction, AI Platform Data Labeling, AutoML Natural Language, AutoML Vision, AutoML Video, AutoML Tables, and Deep Learning Containers. Let’s take a look at how the platform has fared and what’s changed over the last year.

Also read: Top Artificial Intelligence (AI) Software

What Is Google Vertex AI?

Google Vertex AI is a cloud-based third-party machine learning (ML) platform for deploying and maintaining artificial intelligence (AI) models. The machine learning operations (MLOps) platform blends automated machine learning (AutoML) and AI Platform into a unified application programming interface (API), client library, and user interface (UI).

Previously, data scientists had to run millions of datasets to train algorithms. But the Vertex technology stack does the heavy lifting now. It has the computing power to solve complex problems and easily do billions of iterations. Vertex also comes up with the best algorithms for specific needs.

Vertex AI uses a standard ML workflow consisting of stages like data collection, data preparation, training, evaluation, deployment, and prediction. Although Vertex AI has many features, we’ll look at some of its key features here.

  • Whole ML Workflow Under a Unified UI Umbrella: Vertex AI comes with a unified UI and API for every Google Cloud service based on AI.
  • Integrates With Common Open-Source Frameworks: Vertex AI blends easily with commonly used open-source frameworks like PyTorch and TensorFlow and supports other ML tools through custom containers.
  • Access to Pretrained APIs for Different Datasets: Vertex AI makes it easy to integrate video, images, translation, and natural language processing (NLP) with existing applications. It empowers people with minimal expertise and effort to train ML models to meet their business needs.
  • End-to-End Data and AI Integration: Vertex AI Workbench enables Vertex AI to integrate natively with Dataproc, Dataflow, and BigQuery. As a result, users can either develop or run ML models in BigQuery or export data from BigQuery and execute ML models from Vertex AI Workbench.

Also read: The Future of Natural Language Processing is Bright

What’s Included in the Latest Update?

Google understands research is the only way to become an AI-first organization. Many of Google’s product offerings initially started as internal research projects. DeepMind’s AlphaFold project led to running protein prediction models in Vertex AI.

Similarly, researching neural networks provided the groundwork for Vertex AI NAS, which allows data science teams to train models with lower latency and power requirements. Therefore, empathy plays a significant role when AI use cases are considered. Some of the latest offerings within Vertex AI from Google include:

Reduction Server

According to Google, the AI training Reduction Server is an advanced technology that optimizes the latency and bandwidth of multisystem distributed training, which is a way of diversifying ML training across multiple machines, GPUs (graphics processing units), CPUs (central processing units), or custom chips. As a result, it reduces time and uses fewer resources to complete the training.

Tabular Workflows

This feature aims to customize the ML model creation process. Tabular Workflows let the users decide which parts of the workflow they want AutoML technology to handle and which side they like to engineer themselves.

Vertex AI lets elements of Tabular Workflow be integrated into existing pipelines. Google also added the latest managed algorithms, including advanced research models like TabNet, advanced algorithms for feature selection, model distillation, and many more functions.

Serverless Apache Spark

Vertex AI has been integrated with serverless Apache Spark, a unified open-source yet large-scale data analytics engine. Vertex AI users can easily engage in a serverless Spark session for interactive code development.

The partnership of Google and Neo4j enables Vertex users to analyze data features in Neo4j’s platform and then deploy ML models with Vertex. Similarly, the collaboration between Labelbox and Google made it possible to access Labelbox’s data-labeling services for various datasets—images and text among the few—from the Vertex dashboard.

Example-based Explanations

When data turns into mislabelled data, Example-based Explanations offer a better solution. The new feature of Vertex leverages Example-based Explanations to diagnose and solve data issues.

Problem-Solving With Vertex AI

Google claims that Vertex AI requires 80% fewer lines of coding than other platforms to train AI/ML models with custom libraries, and its custom tools support advanced ML coding. Vertex AI’s MLOps tools eliminate the complexity of self-service model maintenance, streamlining ML pipeline operations and Vertex Feature Store to serve, share, and use advanced ML features.

Data scientists with no formal AI/ML training can use Vertex AI, as it offers tools to manage data, create prototypes, experiment, and deploy ML models. It also allows them to interpret and monitor the AI/ML models in production.

A year after the launch of Vertex, Google is aligning itself toward real-world applications. The company’s mission is solving human problems, as showcased at Google I/O. This likely means that its efforts will be directed toward finding a transformative way of doing things through AI.

Read next: Top Data Lake Solutions for 2022

The post What’s New With Google Vertex AI? appeared first on IT Business Edge.

]]>
Data Lake vs. Data Warehouse: What’s the Difference? https://www.itbusinessedge.com/business-intelligence/data-lake-vs-data-warehouse/ Mon, 25 Jul 2022 15:00:00 +0000 https://www.itbusinessedge.com/?p=140672 Data lakes and data warehouses are two of the most popular forms of data storage and processing platforms, both of which can be employed to improve a business’s use of information. However, these tools are designed to accomplish different tasks, so their functions are not exactly the same. We’ll go over those differences here, so […]

The post Data Lake vs. Data Warehouse: What’s the Difference? appeared first on IT Business Edge.

]]>
Data lakes and data warehouses are two of the most popular forms of data storage and processing platforms, both of which can be employed to improve a business’s use of information.

However, these tools are designed to accomplish different tasks, so their functions are not exactly the same. We’ll go over those differences here, so you have a clear idea of what each one entails and choose which would suit your business needs.

See the Top Data Lake Solutions and Top Data Warehouses

What is a data lake?

A data lake is a storage repository that holds vast raw data in its native format until it is needed. It uses a flat architecture to store data, which makes it easier and faster to query data.

Data lakes are usually used for storing big datasets. They’re ideal for large files and great at integrating diverse datasets from different sources because they have no schema or structure to bind them together.

How does a data lake work?

A data lake is a central repository where all types of data can be stored in their native format. Any application or analysis can then access the data without the need for transformation.

The data in a data lake can be from multiple sources and structured, semi-structured, or unstructured. This makes data lakes very flexible, as they can accommodate any data. In addition, data lakes are scalable, so they can grow as a company’s needs change. And because data lakes store files in their original formats, there’s no need to worry about conversions when accessing that information.

Moreover, most companies using a data lake have found they can use more sophisticated tools and processing techniques on their data than traditional databases. A data lake makes accessing enterprise information easier by enabling the storage of less frequently accessed information close to where it will be accessed. It also eliminates the need to perform additional steps to prepare the data before analyzing it. This adds up to much faster query response times and better analytical performance.

Also read: Snowflake vs. Databricks: Big Data Platform Comparison

What is a data warehouse?

A data warehouse is designed to store structured data that has been processed, cleansed, integrated, and transformed into a consistent format that supports historical reporting and analysis. It is a database used for reporting and data analysis and acts as a central repository of integrated data from one or more disparate sources that can be accessed by multiple users.

A data warehouse typically contains historical data that can be used to generate reports and analyze trends over time and is usually built with large amounts of data taken from various sources. The goal is to give decision-makers an at-a-glance view of the company’s overall performance.

How does a data warehouse work?

A data warehouse is a system that stores and analyzes data from multiple sources. It helps organizations make better decisions by providing a centralized view of their data. Data warehouses are typically used for reporting, analysis, predictive modeling, and machine learning.

To build a data warehouse, data must first be extracted and transformed from an organization’s various sources. Then, the data must be loaded into the database in a structured format. Finally, an ETL tool (extract, transform, load) will be needed to put all the pieces together and prepare them for use in analytics tools. Once it’s ready, a software program runs reports or analyses on this data.

Data warehouses may also include dashboards, which are interactive displays with graphical representations of information collected over time. These displays give people working in the company real-time insights into business operations, so they can take action quickly when necessary.

Also read: Top Big Data Storage Products

Differences between data lake and data warehouse

When storing big data, data lakes and data warehouses have different features. Data warehouses store traditional transactional databases and store data in one table with structured columns. Comparatively, a data lake is used for big data analytics. It stores raw unstructured data that can be analyzed later for insights.

ParametersData lakeData warehouse
Data typeUnstructured dataProcessed data
StorageData are stored in their raw form regardless of the sourceData is analyzed and transformed
PurposeBig data analyticsStructured data analysis
Database schemaSchema-on-readSchema-on-write
Target user groupData scientistBusiness or data analysts
SizeStores all dataOnly structured data

Data type: Unstructured data vs. processed data

The main difference between the two is that in a data lake, the data is not processed before it is stored, while in a data warehouse it is. A data lake is a place to store all structured and unstructured data, and a data warehouse is a place to store only structured data. This means that a data lake can be used for big data analytics and machine learning, while a data warehouse can only be used for more limited data analysis and reporting.

Storage: Stored raw vs. clean and transformed

The data storage method is another important difference between a data lake and a data warehouse. A data lake stores raw information to make it easier to search through or analyze. On the other hand, a data warehouse stores clean, processed information, making it easier to find what is needed and make changes as necessary. Some companies use a hybrid approach, in which they have a data lake and an analytical database that complement each other.

Purpose: Undetermined vs. determined

The purposes of a data lake’s data are undetermined. Businesses can use the data for any purpose, whereas data warehouse data is already determined and in use. Hence why data lakes have more flexible data structures compared to data warehouses.

Where data lakes are flexible, data warehouses have more structured data. In a warehouse, data is pre-structured to fit a specific purpose. The nature of these structures depends on business operations. Moreover, a warehouse may contain structured data from an existing application, such as an enterprise resource planning (ERP) system, or it may be structured by hand based on user needs.

Database schema: Schema-on-read vs schema-on-write

A data warehouse follows a schema-on-write approach, whereas a data lake follows a schema-on-read approach. In the schema-on-write model, tables are created ahead of time to store data. If how the table is organized has to be changed or if columns need to be added later on, it’s difficult because all of the queries using that table will need to be updated.

On the other hand, schema changes are expensive and take a lot of time to complete. The schema-on-read model of a data lake allows a database to store any information in any column it wants. New data types can be addcolumns, and existing columns can be changed at any time without affecting the running systemed as new . However, if specific rows need to be found quickly, this could become more difficult than schema-on-write systems.

Users: Data scientist vs. business or data analysts

A data warehouse is designed to answer specific business questions, whereas a data lake is designed to be a storage repository for all of an organization’s data with no particular purpose. In a data warehouse, business users or analysts can interact with the data in a way that helps them find the answers they need to gain valuable insight into their operation.

On the other hand, there are no restrictions on how information can be used in a data lake because it is not intended to serve one single use case. Users must take responsibility for curating the data themselves before any analysis takes place and ensuring it’s of good quality before storing it in this format.

Size: All data up to petabytes of space vs. only structured data

The size difference is due to the data warehouse storing only structured data instead of all data. The two types of storage differ in many ways, but they are the most prevalent. The first way they differ is in their purpose: Data lakes store all data, while warehouses store only structured data.

Awareness of what type of storage is needed can help determine if a company should start with a data lake or a warehouse. A company may start with an enterprise-wide information hub for raw data and then use a more focused solution for datasets that have undergone additional processing steps.

Data lake vs. data warehouse: Which is right for me?

A data lake is a centralized repository that allows companies to store all of its structured and unstructured data at any scale, whereas a data warehouse is a relational database designed for query and analysis.

Determining which is the most suitable will depend on a company’s needs. If large amounts of data needs to be stored quickly, then a data lake is the way. However, a data warehouse is more appropriate if there is a need for analytics or insights into specific application data.

A successful strategy will likely involve implementing both models. A data lake can be used for storing big volumes of unstructured and high-volume data while a data warehouse can be used to analyze specific structured data.

Read next: Snowflake vs. Databricks: Big Data Platform Comparison

The post Data Lake vs. Data Warehouse: What’s the Difference? appeared first on IT Business Edge.

]]>
The Toll Facial Recognition Systems Might Take on Our Privacy and Humanity https://www.itbusinessedge.com/business-intelligence/facial-recognition-privacy-concerns/ Fri, 22 Jul 2022 18:54:44 +0000 https://www.itbusinessedge.com/?p=140667 Artificial intelligence really is everywhere in our day-to-day lives, and one area that’s drawn a lot of attention is its use in facial recognition systems (FRS). This controversial collection of technology is one of the most hotly-debated among data privacy activists, government officials, and proponents of tougher measures on crime. Enough ink has been spilled […]

The post The Toll Facial Recognition Systems Might Take on Our Privacy and Humanity appeared first on IT Business Edge.

]]>
Artificial intelligence really is everywhere in our day-to-day lives, and one area that’s drawn a lot of attention is its use in facial recognition systems (FRS). This controversial collection of technology is one of the most hotly-debated among data privacy activists, government officials, and proponents of tougher measures on crime.

Enough ink has been spilled on the topic to fill libraries, but this article is meant to distill some of the key arguments, viewpoints, and general information related to facial recognition systems and the impacts they can have on our privacy today.

What Are Facial Recognition Systems?

The actual technology behind FRS and who develops them can be complicated. It’s best to have a basic idea of how these systems work before diving into the ethical and privacy-related concerns related to using them.

How Do Facial Recognition Systems Work?

On a basic level, facial recognition systems operate on a three-step process. First, the hardware, such as a security camera or smartphone, records a photo or video of a person.

That photo or video is then fed into an AI program, which then maps and analyzes the geometry of a person’s face, such as the distance between eyes or the contours of the face. The AI also identifies specific facial landmarks, like forehead, eye sockets, eyes, or lips.

Finally, all these landmarks and measurements come together to create a digital signature which the AI compares against its database of digital signatures to see if there is a match or to verify someone’s identity. That digital signature is then stored on the database for future reference.

Read More At: The Pros and Cons of Enlisting AI for Cybersecurity

Use Cases of Facial Recognition Systems

A technology like facial recognition is broadly applicable to a number of different industries. Two of the most obvious are law enforcement and security. 

With facial recognition software, law enforcement agencies can track suspects and offenders unfortunate enough to be caught on camera, while security firms can utilize it as part of their access control measures, checking people’s faces as easily as they check people’s ID cards or badges.

Access control in general is the most common use case for facial recognition so far. It generally relies on a smaller database (i.e. the people allowed inside a specific building), meaning the AI is less likely to hit a false positive or a similar error. Plus, it’s such a broad use case that almost any industry imaginable could find a reason to implement the technology.

Facial recognition is also a hot topic in the education field, especially in the U.S. where vendors pitch facial recognition surveillance systems as a potential solution to the school shootings that plague the country more than any other. It has additional uses in virtual classroom platforms as a way to track student activity and other metrics.

In healthcare, facial recognition can theoretically be combined with emergent tech like emotion recognition for improved patient insights, such as being able to detect pain or monitor their health status. It can also be used during the check-in process as a no-contact alternative to traditional check-in procedures.

The world of banking saw an increase in facial recognition adoption during the COVID-19 pandemic, as financial institutions looked for new ways to safely verify customers’ identities.

Some workplaces already use facial recognition as part of their clock-in-clock-out procedures. It’s also seen as a way to monitor employee productivity and activity, preventing folks from “sleeping on the job,” as it were. 

Companies like HireVue were developing software using facial recognition that can determine the hireability of prospective employees. However, it discontinued the facial analysis portion of its software in 2021. In a statement, the firm cited public concerns over AI and a growing devaluation of visual components to the software’s effectiveness.

Retailers who sell age-restricted products, such as bars or grocery stores with liquor licenses, could use facial recognition to better prevent underaged customers from buying these products.

Who Develops Facial Recognition Systems?

The people developing FRS are many of the same usual suspects who push other areas of tech research forward. As always, academics are some of the primary contributors to facial recognition innovation. The field was started in academia in the 1950s by researchers like Woody Bledsoe.

In a modern day example, The Chinese University of Hong Kong created the GaussianFace algorithm in 2014, which its researchers reported had surpassed human levels of facial recognition. The algorithm scored 98.52% accuracy compared to the 97.53% accuracy of human performance.

In the corporate world, tech giants like Google, Facebook, Microsoft, IBM, and Amazon have been just some of the names leading the charge.

Google’s facial recognition is utilized in its Photos app, which infamously mislabeled a picture of software engineer Jacky Alciné and his friend, both of whom are black, as “gorillas” in 2015. To combat this, the company simply blocked “gorilla” and similar categories like “chimpanzee” and “monkey” on Photos.

Amazon was even selling its facial recognition system, Rekognition, to law enforcement agencies until 2020, when they banned the use of the software by police. The ban is still in effect as of this writing.

Facebook used facial recognition technology on its social media platform for much of the platform’s lifespan. However, the company shuttered the software in late 2021 as “part of a company-wide move to limit the use of facial recognition in [its] products.”

Additionally, there are firms who specialize in facial recognition software like Kairos, Clearview AI, and Face First who are contributing their knowledge and expertise to the field.

Read More At: The Value of Emotion Recognition Technology

Is This a Problem?

To answer the question of “should we be worried about facial recognition systems,” it will be best to look at some of the arguments that proponents and opponents of facial recognition commonly use.

Why Use Facial Recognition?

The most common argument in favor of facial recognition software is that it provides more security for everyone involved. In enterprise use cases, employers can better manage access control, while lowering the chance of employees becoming victims of identity theft.

Law enforcement officials say the use of FRS can aid their investigative abilities to make sure they catch perpetrators quickly and more accurately. It can also be used to track victims of human trafficking, as well as individuals who might not be able to communicate such as people with dementia. This, in theory, could reduce the number of police-caused deaths in cases involving these individuals.

Human trafficking and sex-related crimes are an oft-spoken refrain from proponents of this technology in law enforcement. Vermont, the state with the strictest bans on facial recognition, peeled back their ban slightly to allow for its use in investigating child sex crimes.

For banks, facial recognition could reduce the likelihood and frequency of fraud. With biometric data like what facial recognition requires, criminals can’t simply steal a password or a PIN and gain full access to your entire life savings. This would go a long way in stopping a crime for which the FTC received 2.8 million reports from consumers in 2021 alone.

Finally, some proponents say, the technology is so accurate now that the worries over false positives and negatives should barely be a concern. According to a 2022 report by the National Institute of Standards and Technology, top facial recognition algorithms can have a success rate of over 99%, depending on the circumstances.

With accuracy that good and use cases that strong, facial recognition might just be one of the fairest and most effective technologies we can use in education, business, and law enforcement, right? Not so fast, say the technology’s critics.

Why Ban Facial Recognition Technology?

First, the accuracy of these systems isn’t the primary concern for many critics of FRS. Whether the technology is accurate or not is inessential. 

While Academia is where much research on facial recognition is conducted, it is also where many of the concerns and criticisms are raised regarding the technology’s use in areas of life such as education or law enforcement

Northeastern University Professor of Law and Computer Science Woodrow Hartzog is one of the most outspoken critics of the technology. In a 2018 article Hartzog said, “The mere existence of facial recognition systems, which are often invisible, harms civil liberties, because people will act differently if they suspect they’re being surveilled.”

The concerns over privacy are numerous. As AI ethics researcher Rosalie A. Waelen put it in a 2022 piece for AI & Ethics, “[FRS] is expected to become omnipresent and able to infer a wide variety of information about a person.” The information it is meant to infer is not necessarily information an individual is willing to disclose.

Facial recognition technology has demonstrated difficulties identifying individuals of diverse races, ethnicities, genders, and age. This, when used by law enforcement, can potentially lead to false arrests, imprisonments, and other issues.

As a matter of fact, it already has. In Detroit, Robert Williams, a black man, was incorrectly identified by facial recognition software as a watch thief and falsely arrested in 2020. After being detained for 30 hours, he was released due to insufficient evidence after it became clear that the photographed suspect and Williams were not the same person.

This wasn’t the only time this happened in Detroit either. Michael Oliver was wrongly picked by facial recognition software as the one who threw a teacher’s cell phone and broke it.

A similar case happened to Nijeer Parks in late 2019 in New Jersey. Parks was detained for 10 days for allegedly shoplifting candy and trying to hit police with a car. Facial recognition falsely identified him as the perpetrator, despite Parks being 30 miles away from the incident at the time. 

There is also, in critics’ minds, an inherently dehumanizing element to facial recognition software and the way it analyzes the individual. Recall the aforementioned incident wherein Google Photos mislabeled Jacky Alciné and his friend as “gorillas.” It didn’t even recognize them as human. Given Google’s response to the situation was to remove “gorilla” and similar categories, it arguably still doesn’t.

Finally, there comes the issue of what would happen if the technology was 100% accurate. The dehumanizing element doesn’t just go away if Photos can suddenly determine that a person of color is, in fact, a person of color. 

The way these machines see us is fundamentally different from the way we see each other because the machines’ way of seeing goes only one way.  As Andrea Brighenti said, facial recognition software “leads to a qualitatively different way of seeing … .[the subject is] not even fully human. Inherent in the one way gaze is a kind of dehumanization of the observed.”

In order to get an AI to recognize human faces, you have to teach it what a human is, which can, in some cases, cause it to take certain human characteristics outside of its dataset and define them as decidedly “inhuman.”

That said, making facial recognition technology more accurate for detecting people of color only really serves to make law enforcement and business-related surveillance better. This means that, as researchers Nikki Stevens and Os Keyes noted in their 2021 paper for academic journal Cultural Studies, “efforts to increase representation are merely efforts to increase the ability of commercial entities to exploit, track and control people of colour.”

Final Thoughts

Ultimately, how much one worries about facial recognition technology comes down to a matter of trust. How much trust does a person place in the police or Amazon or any random individual who gets their hands on this software and the power it provides that they will only use it “for the right reasons”?

This technology provides institutions with power, and when thinking about giving power to an organization or an institution, one of the first things to consider is the potential for abuse of that power. For facial recognition, specifically for law enforcement, that potential is quite large.

In an interview for this article, Frederic Lederer, William & Mary Law School Chancellor Professor and Director of the Center for Legal & Court Technology, shared his perspective on the potential abuses facial recognition systems could facilitate in the U.S. legal system:

“Let’s imagine we run information through a facial recognition system, and it spits out 20 [possible suspects], and we had classified those possible individuals in probability terms. We know for a fact that the system is inaccurate and even under its best circumstances could still be dead wrong.

If what happens now is that the police use this as a mechanism for focusing on people and conducting proper investigation, I recognize the privacy objections, but it does seem to me to be a fairly reasonable use.

The problem is that police officers, law enforcement folks, are human beings. They are highly stressed and overworked human beings. And what little I know of reality in the field suggests that there is a large tendency to dump all but the one with the highest probability, and let’s go out and arrest him.”

Professor Lederer believes this is a dangerous idea, however:

“…since at minimum the way the system operates, it may be effectively impossible for the person to avoid what happens in the system until and unless… there is ultimately a conviction.”

Lederer explains that the Bill of Rights guarantees individuals a right to a “speedy trial.” However, court interpretations have borne out that arrested individuals will spend at least a year in prison before the courts even think about a speedy trial.

Add to that plea bargaining:

“…Now, and I don’t have the numbers, it is not uncommon for an individual in jail pending trial to be offered the following deal: ‘plead guilty, and we’ll see you’re sentenced to the time you’ve already been [in jail] in pre-trial, and you can walk home tomorrow.’ It takes an awful lot of guts for an individual to say ‘No, I’m innocent, and I’m going to stay here as long as is necessary.’

So if, in fact, we arrest the wrong person, unless there is painfully obvious evidence that the person is not the right person, we are quite likely to have individuals who are going to serve long periods of time pending trial, and a fair number of them may well plead guilty just to get out of the process.

So when you start thinking about facial recognition error, you can’t look at it in isolation. You have to ask: ‘How will real people deal with this information and to what extent does this correlate with everything else that happens?’ And at that point, there’s some really good concerns.”

As Lederer pointed out, these abuses already happen in the system, but facial recognition systems could exacerbate these abuses and even increase them. They can perpetuate pre-existing biases and systemic failings, and even if their potential benefits are enticing, the potential harm is too present and real to ignore.

Of the viable use cases of facial recognition that have been explored, the closest thing to a “safe” use case is ID verification. However, there are plenty of equally effective ID verification methods, some of which use biometrics like fingerprints.

In reality, there might not be any “safe” use case for facial recognition technology. Any advancements in the field will inevitably aid surveillance and control functions that have been core to the technology from its very beginning.

For now, Lederer said he hasn’t come to any firm conclusions as to whether the technology should be banned. But he and privacy advocates like Hartzog will continue to watch how it’s used.

Read Next: What’s Next for Ethical AI?

The post The Toll Facial Recognition Systems Might Take on Our Privacy and Humanity appeared first on IT Business Edge.

]]>
Snowflake vs. Databricks: Big Data Platform Comparison https://www.itbusinessedge.com/business-intelligence/snowflake-vs-databricks/ Thu, 14 Jul 2022 19:16:49 +0000 https://www.itbusinessedge.com/?p=140660 The extraction of meaningful information from Big Data is a key driver of business growth. For example, the analysis of current and past product and customer data can help organizations anticipate customer demand for new products and services and spot opportunities they might otherwise miss. As a result, the market for Big Data tools is […]

The post Snowflake vs. Databricks: Big Data Platform Comparison appeared first on IT Business Edge.

]]>
The extraction of meaningful information from Big Data is a key driver of business growth.

For example, the analysis of current and past product and customer data can help organizations anticipate customer demand for new products and services and spot opportunities they might otherwise miss.

As a result, the market for Big Data tools is ever-growing. In a report last month, MarketsandMarkets predicted that the Big Data market will grow from $162.6 billion in 2021 to $273.4 billion in 2026, a compound annual growth rate (CAGR) of 11%.

A variety of purpose-built software and hardware tools for Big Data analysis are available on the market today. To make sense of all that data, the first step is acquiring a robust Big Data platform, such as Snowflake or Databricks.

Current Big Data analytics requirements have forced a major shift in Big Data warehouse and storage architecture, from the conventional block- and file-based storage architecture and relational database management systems (RDBMS) to more scalable architectures like scale-out network-attached storage (NAS), object-based storage, data lakes, and data warehouses.

Databricks and Snowflake are at the forefront of those changing data architectures. In some ways, they perform similar functions—Databricks and Snowflake both made our lists of the Top DataOps Tools and the Top Big Data Storage Products, while Snowflake also made our list of the Top Data Warehouse Tools—but there are very important differences and use cases that IT buyers need to be aware of, which we’ll focus on here.

What is Snowflake?

Snowflake logo

Snowflake for Data Lake Analytics is a cross-cloud platform that enables a modern data lake strategy. The platform improves data performance and provides secure, quick, and reliable access to data.

Snowflake’s data warehouse and data lake technology consolidates structured, semi-structured, and unstructured data onto a single platform, provides fast and scalable analytics, is simple and cost-effective, and permits safe collaboration.

Key differentiators

  • Store data in Snowflake-managed smart storage with automatic micro-partitioning, encryption at rest and in transit, and efficient compression.
  • Support multiple workloads on structured, semi-structured, and unstructured data with Java, Python, or Scala.
  • Access data from existing cloud object storage instances without having to move data.
  • Seamlessly query, process, and load data without sacrificing reliability or speed.
  • Build powerful and efficient pipelines with Snowflake’s elastic processing engine for cost savings, reliable performance, and near-zero maintenance.
  • Streamline pipeline development using SQL, Java, Python, or Scala with no additional services, clusters, or copies of data to manage.
  • Gain insights into who is accessing what data with a built-in view, Access History.
  • Automatically identify classified data with Classification, and protect it while retaining analytical value with External Tokenization and Dynamic Data Masking.

Pricing: Enjoy a 30-day free trial, including $400 worth of free usage. Contact the Snowflake sales team for product pricing details.

What is Databricks?

Databricks logo

The Databricks Lakehouse Platform unifies your data warehousing and artificial intelligence (AI) use cases onto a single platform. The Big Data platform combines the best features of data lakes and data warehouses to eliminate traditional data silos and simplify the modern data stack.

Key differentiators

  • Databricks Lakehouse Platform delivers the strong governance, reliability, and performance of data warehouses along with the flexibility, openness, and machine learning (ML) support of data lakes.
  • The unified approach eliminates the traditional data silos separating analytics, data science, ML, and business intelligence (BI).
  • The Big Data platform is developed by the original creators of Apache Spark, MLflow, Koalas, and Delta Lake.
  • Databricks Lakehouse Platform is being developed on open standards and open source to maximize flexibility.
  • The multicloud platform’s common approach to security, data management, and governance helps you function more efficiently and innovate seamlessly.
  • Users can easily share data, build modern data stacks, and avoid walled gardens, with unrestricted access to more than 450 partners across the data landscape.
  • Partners include Qlik, RStudio, Tableau, MongoDB, Sparkflows, HashiCorp, Rearc Data, and TickSmith.
  • Databricks Lakehouse Platform provides a collaborative development environment for data teams.

Pricing: There’s a 14-day full trial in your cloud or a lightweight trial hosted by Databricks. Reach out to Databricks for pricing information.

Snowflake vs. Databricks: What Are the Differences?

Here, in our analysis, is how the Big Data platforms compare:

FeaturesSnowflakeDatabricks
Scalability
Integration
Customization
Ease of Deployment
Ease of Administration and Maintenance
Pricing Flexibility
Ability to Understand Needs
Quality of End-User Training
Ease of Integration Using Standard Application Programming Interfaces (APIs) and Tools
Availability of Third-Party Resources
Data Lake
Data Warehouse
Service and Support
Willingness to Recommend
Overall Capability Score

Choosing a Big Data Platform

Organizations need resilient and reliable Big Data management, analysis and storage tools to reliably extract meaningful insights from Big Data. In this guide, we explored two of the best tools in the data lake and data warehouse categories.

There are a number of other options for Big Data analytics platforms, and you should find the one that best meets your business needs. Explore other tools such as Apache Hadoop, Apache HBase, NetApp Scale-out NAS and others before making a purchase decision.

Further reading:

The post Snowflake vs. Databricks: Big Data Platform Comparison appeared first on IT Business Edge.

]]>
5 Top VCs For Data Startups https://www.itbusinessedge.com/business-intelligence/top-vcs-for-data-startups/ Tue, 12 Jul 2022 14:47:56 +0000 https://www.itbusinessedge.com/?p=140655 It seems that the boom times for venture capital are over. This is not just the sentiment of the media or analysts. Keep in mind that a variety of venture capitalists agree that the slowdown is real – and could last a few years. Just look at Sequoia. On May 16, the top VC firm […]

The post 5 Top VCs For Data Startups appeared first on IT Business Edge.

]]>
It seems that the boom times for venture capital are over. This is not just the sentiment of the media or analysts. Keep in mind that a variety of venture capitalists agree that the slowdown is real – and could last a few years.

Just look at Sequoia. On May 16, the top VC firm made a presentation to its portfolio companies entitled “Adapting to Endure.” It noted that the economy was at a “crucible moment” and founders need to be careful with cash burn rates.

Despite all this, top venture capitalists understand that some of the best opportunities come during hard times. Besides, there remain plenty of secular trends that will continue to drive growth.

One is data. There’s little argument from CEOs that this is a strategic asset. However, there needs to be effective tools to get value from data, and that will continue to drive investment in data startups for some time to come.

Here we’ll look at five of the top venture capital firms for data – along with some insight into where they see current investment opportunities.

Also read:

Accel

Founded in 1983, Accel has invested in many categories over the years, like consumer, media, security, ecommerce and so on. But the firm has also shown strong data chops.

Its most iconic investment occurred in the summer of 2005. Accel agreed to invest $12.7 million in Facebook – which is now called Meta – for a 10.7% stake.

In terms of its enterprise data deals, they include companies like UiPath, Cloudera, Atlassian and Slack. As for recent investments, there is the $60 million funding of Cyera. The company has built a cloud-native data security platform that evaluates whether data – on AWS, Azure and GCP — is sensitive and vulnerable to risk. This is all done in real time.

Accel just raised a mega $4 billion fund that is focused on late-stage deals, an impressive display of confidence by the firm’s limited partners (LPs). This is certainly a contrarian bet as this category of investments has softened during the past year. But with valuations much more attractive, the timing could actually be good for Accel.

Greylock

Another name with some staying power, 24-year-old Greylock Partners focuses on enterprise and consumer software companies. The investments span early seed levels to later stages. In fact, the firm will incubate some of its deals at its offices. This was the case with companies like Palo Alto Networks, Workday and Sumo Logic.

One of Greylock’s best deals was for LinkedIn. The firm invested in the startup – when it had fewer than one million members – a year after its founding in 2004.

Then in 2016, Microsoft agreed to acquire LinkedIn for $26.5 billion. Reid Hoffman, who is the cofounder of LinkedIn, is currently a partner at Greylock.

An interesting recent funding for a data startup is for Baseten. The company’s system allows for fast and easy migration of machine learning to production applications. It automates the complex backend and MLOps processes. Greylock participated in the seed and Series A financings.

Sequoia

Sequoia is one of the pioneers of the venture capital industry. Don Valentine founded the firm in 1972 and he raised his first fund a couple years later. It wasn’t easy, as he had to convince investors about the potential benefits of investing in startups. At the time, it was a fairly radical concept for institutions.

But Valentine had a knack for finding the next big thing. For example, he was an early investor in Atari and Apple.

This was just the beginning. Sequoia would go on to have one of the best track records in venture capital. Just some of its huge winners include Snowflake, Stripe, WhatsApp, ServiceNow, Cisco, Yahoo! and Google.

No doubt, a big part of the investment thesis for Sequoia is on data. For example, in early June the firm led a $4.5 million seed round for CloseFactor. The startup leverages sophisticated machine learning to customize sales pitches and target the right prospects. The system has shown 2-to-4 times improvements in the quality of pipelines.

Also read: Top 7 Data Management Trends to Watch in 2022

Andreessen Horowitz

It usually takes at least a decade to become an elite venture firm. The reason is that early-stage investments generally need lots of time to generate breakout returns.

But for Andreessen Horowitz, it was able to become an elite firm within a few years. Then again, it certainly helped that its founders are visionary entrepreneurs Marc Andreessen and Ben Horowitz.

Yet they also set out to disrupt the traditional model for venture capital. For example, it set out to operate like a Hollywood talent agency. Andressen Horowitz hired specialists to help entrepreneurs with many parts of their business, such as PR, sales, marketing, and design.

The formula has been a winner. Some of Andressen Horowitz’s notable investments include: Stripe, Databricks, Plaid, Figma, Tanium and GitHub. And yes, many other venture capital firms have replicated the model.

As for a recent data deal from Andreessen Horowitz, there is the $100 million Series D funding for Imply Data (the valuation came to $1.1 billion). The founders of the company are the creators of Apache Druid, which is an open source database for analytics applications. With Imply, it has focused on the large market for developers building analytics applications.

Andreessen Horowitz certainly has lots of fire power for many more deals. In January, the company announced $9 billion in new capital for venture opportunities, growth stage and biotech.

Lightspeed

Lightspeed got its start at the depths of the dotcom bust – October 2000. But the timing would be propitious. The firm had fresh capital and the valuations were much more attractive.

In the early days, Lightspeed was focused on consumer startups. For example, it was an early investor in Snapchat. Lightspeed contributed $485,000 in the seed round.

However, during the past decade, Lightspeed has upped its game with enterprise software and infrastructure opportunities. Some of its standout deals include AppDynamics, MuleSoft, and Nutanix.

Among recent data deals for Lightspeed, Redpanda Data is one that stands out. The venture capital firm led a $50 million Series B round. Redpanda has built a streaming platform for developers. Think of it as a system of record for real-time and historical data.

In 2020, Lightspeed raised three funds for a total of $4.2 billion. The firm is now seeking about $4.5 billion for its next set of financing vehicles.

Read next: Top Artificial Intelligence (AI) Software 2022

The post 5 Top VCs For Data Startups appeared first on IT Business Edge.

]]>
Microsoft Drops Emotion Recognition as Facial Analysis Concerns Grow https://www.itbusinessedge.com/business-intelligence/microsoft-drops-emotion-recognition-facial-analysis/ Tue, 05 Jul 2022 23:38:48 +0000 https://www.itbusinessedge.com/?p=140609 Despite facial recognition technology’s potential, it faces mounting ethical questions and issues of bias. To address those concerns, Microsoft recently released its Responsible AI Standard and made a number of changes, the most noteworthy of which is to retire the company’s emotional recognition AI technology. Responsible AI Microsoft’s new policy contains a number of major […]

The post Microsoft Drops Emotion Recognition as Facial Analysis Concerns Grow appeared first on IT Business Edge.

]]>
Despite facial recognition technology’s potential, it faces mounting ethical questions and issues of bias.

To address those concerns, Microsoft recently released its Responsible AI Standard and made a number of changes, the most noteworthy of which is to retire the company’s emotional recognition AI technology.

Responsible AI

Microsoft’s new policy contains a number of major announcements.

  • New customers must apply for access to use facial recognition operations in Azure Face API, Computer Vision and Video Indexer, and existing customers have one year to apply and be approved for continued access to the facial recognition services.
  • Microsoft’s policy of Limited Access adds use case and customer eligibility requirements to access the services.
  • Facial detection capabilities—including detecting blur, exposure, glasses, head pose, landmarks, noise, occlusion, and facial bounding box—will remain generally available and do not require an application.

The centerpiece of the announcement is that the software giant “will retire facial analysis capabilities that purport to infer emotional states and identity attributes such as gender, age, smile, facial hair, hair, and makeup.”

Microsoft noted that “the inability to generalize the linkage between facial expression and emotional state across use cases, regions, and demographics…opens up a wide range of ways they can be misused—including subjecting people to stereotyping, discrimination, or unfair denial of services.”

Also read: AI Suffers from Bias—But It Doesn’t Have To

Moving Away from Facial Analysis

There are a number of reasons why major IT players have been moving away from facial recognition technologies, including limiting law enforcement access to the technology.

Fairness concerns

Automated facial analysis and facial recognition software have always generated controversy. Combine this with the often inherent societal biases of AI systems and the potential to exacerbate issues of bias intensifies. Many commercial facial analysis systems today inadvertently exhibit bias in categories such as race, age, culture, ethnicity and gender. Microsoft’s Responsible AI Standard implementation aims to help the company get ahead of potential issues of bias through its outlined Fairness Goals and Requirements.

Appropriate use controls

Regardless of Azure AI Custom Neural Voice’s boundless potential in entertainment, accessibility and education, it could also be greatly misused to deceive listeners by impersonating speakers. Microsoft’s Responsible AI program, plus the Sensitive Users review process essential to the Responsible AI Standard, reviewed its Facial Recognition and Custom Neural Voice technologies to develop a layered control framework. By limiting these technologies and implementing these controls, Microsoft hopes to safeguard the technologies and users from misuse while ensuring that their implementations are of value.

Lack of consensus on emotions

Microsoft’s decision to do away with public access to the emotion recognition and facial characteristics identification features of its AI is due to the lack of a distinct consensus on the definition of emotions. Experts from within and outside the company have pointed out the effect of this lack of consensus on emotion recognition technology products, as they generalize inferences across demographics, regions and use cases. This hinders the ability of the technology to provide appropriate solutions to the problems it aims to solve and ultimately impacts its trustworthiness.

The skepticism associated with the technology comes from its disputed efficacy and justification for its use. Human rights groups contend that emotion AI is discriminatory and manipulative. One study found that emotion AI constantly identified White subjects to have more positive emotions than Black subjects across two different facial recognition software platforms.

Intensifying privacy concerns

There is increasing scrutiny of facial recognition technologies and their unethical use for public surveillance and mass face detection without consent. Even though facial analysis collects generic data that is kept anonymous—such as Azure Face’s service that infers identity attributes like gender, hair, age, and more—anonymization does not alleviate ever-growing privacy concerns. Aside from consenting to such technologies, subjects may often harbor concerns about how the data collected by these technologies is stored, protected and used.

Also read: What Does Explainable AI Mean for Your Business?

Facial Detection and Bias

Algorithmic bias sees machine learning algorithms portray the biases of either their creators or their input data. The large-scale usage of these models in our technology-dependent lives means that their use cases are at risk of adopting and proliferating mass-produced biases.

Facial detection technologies struggle to produce accurate results in use cases involving women, dark-skinned people and older adults, as it is common to find these technologies being trained by facial image datasets dominated by Caucasian subjects. Bias in facial analysis and facial recognition technologies yields real-life consequences, such as the following examples.

Inaccuracy

Regardless of the strides that facial detection technologies have taken, bias often yields inaccurate results. Studies show that face detection technologies generally perform better with lighter skin complexions. One study reports findings of the identification of lighter-skinned males having a maximum error rate of 0.8% compared to up to 34.7% for dark-skinned women.

The failures in recognizing the faces of dark-skinned people have led to instances where the technology has been used wrongly by law enforcement. In February 2019, a Black man was accused of not only shoplifting but also attempting to hit a police officer with a car even though he was forty miles away from the scene of the crime at the time. He spent 10 days in jail and his defense cost him $5,000.

Since the case was dismissed for lack of evidence in November 2019, the man is suing the authorities involved for false arrest, imprisonment and civil rights violation. In a similar case, another man was wrongfully arrested as a result of inaccuracy in facial recognition. Such inaccuracies raise concerns about how many wrongful arrests and convictions may have taken place.

Several vendors of the technology, such as IBM, Amazon, and Microsoft, are aware of such limitations in areas like law enforcement and the implication of the technology for racial injustice and have moved to prevent potential misuse of their software. Microsoft’s policy prohibits the use of its Azure Face by or for state police in the United States.

Decision making

It is not uncommon to find facial analysis technology being used to assist in the evaluation of video interviews with job candidates. These tools influence recruiters’ hiring decisions using data they generate by analyzing facial expressions, movements, choice of words, and vocal tone. Such use cases are meant to lower hiring costs and increase efficiency by expediting the screening and recruitment of new hires.

However, failure to train such algorithms on datasets that are both large enough and diverse enough introduces bias. Such bias may deem certain people to be more suitable for employment than others. False positives or negatives may be the determinants of the employment of an unsuitable candidate as well as the rejection of the most suitable one. As long as they contain bias, the same results will likely be experienced in any similar context where the technology is used to make decisions based on people’s faces.

What’s Next for Facial Analysis?

All of this doesn’t mean that Microsoft is discarding its facial analysis and recognition technology entirely, as the company recognizes that these features and capabilities can yield value in controlled accessibility contexts. Microsoft’s biometric systems such as facial recognition will be limited to partners and customers of managed services. The availability of facial analysis will continue to be available to users until June 30, 2023, via the Limited Access arrangement.

Limited Access only applies to users working directly with the Microsoft accounts team. Microsoft has provided a list of approved Limited Access use cases here. Users have until then to submit applications for approval to continue using the technology. Such systems will also be limited to use cases that are deemed acceptable. Additionally, a code of conduct and guardrails will be used to ensure authorized users do not misuse the technology.

The Computer Vision and Video Indexer celebrity recognition features are also subject to Limited Access. Video Indexer’s face identification also falls under Limited. Customers will no longer have general access to facial recognition from these two services, in addition to Azure Face API.

As a result of its review, Microsoft announced, “We are undertaking responsible data collections to identify and mitigate disparities in the performance of the technology across demographic groups and assessing ways to present this information in a way that would be insightful and actionable for our customers.”

Read next: Best Machine Learning Software

The post Microsoft Drops Emotion Recognition as Facial Analysis Concerns Grow appeared first on IT Business Edge.

]]>