Thursday, December 22, 2011

Social Internet Computing Fabric - Activity Streams

According to Gartner, we are currently experiencing 4 mega trends in the world of internet computing that will change the shape of the IT landscape forever.  Those key trends are: Social, Mobility, Cloud, and Context aware computing. 

If this is true, then I believe that for any application being developed today, it should be considered how each of these aspects will be woven into the core fabric of the application.  In this article I will touch on what I believe is a core piece of the social computing fabric – Activity Streams.

We’ve come to know of Activity Streams through their implementation in common social web applications such as Facebook and LinkedIn. Twitter has recently started exposing Activity Streams through their Activity Tab and Facebook even offers an extensibility plugin to embed their Activity Feeds outside of their own website.

As you plan your own architecture you may come to think of designing a custom Activity Stream to display recent activities to users of your site.  You would start planning by thinking about the different types of activities that can occur like so:

image

Here we see that a user can perform many actions within the site and that we may wish to display each of these in an Activity Stream.  When the user performs an activity, the details of that event are captured and stored in a database.  When you combine the overall amount of activity from many users within a social network, you can see that the Activity Stream becomes a useful barometer of what’s going on. 

You could also add other activities such as Favourite’ing, Liking, Member Profile updates, etc.  There is no limit to the number of types of different activities that you might want to display it all comes down to the type of application that you are developing and what it would make sense for the members of the social networks contained within your application to see.

As you design your Activity Stream, many questions will arise.  One of the questions that you will come across is that of permissions – which activities should members be able to see?

image

Some of the constraints surrounding permissions might be whether the two users are members of the same group, or whether one of the users has set specific content constraints on an individual content item.

image

Data Structure

A way to deal with this is to raise events consistently throughout your application and have a set of listeners log the details of those events into a table with columns that describe attributes for each permission boundary that you might want to filter on.  This might include filtering on attributes such as:

  • IsPublic – a flag which determines whether the content is visible to everybody
  • UserId – the identity of the user who posted the item
  • GroupId – the identity of the group that the content was posted to
  • FolderId – an optional identity value indicating which folder within the group that the content was posted to

Other columns that I would suggest including in your Activity Stream table are:

  • RSS Fields – Fields that allow you to display a title, date created, and description without having to run separate queries to get at that information.
  • Item Type – A value which indicates the type of item.  This could be used to display a custom icon to represent the item in the Activity Stream when rendered on a web page for example.
  • Item Id – The underlying identity of the item being represented.

The Activity Stream data will end up looking something like the data in the following image:

image

And from this data can be crafted a query which has the logic to enforce permissions based on your application’s logic.  An example of such a query might look like this:

CREATE PROCEDURE [dbo].[GetStreamItems] 
@accountId int
AS
BEGIN

SELECT top 100 [StreamItemId] ,[Title] ,[Description] ,[ItemId] ,
'Item Type' =
case
when [ItemTypeId] = 1 then 'Post'
when [ItemTypeId] = 2 then 'Comment'
else 'System Message'
end,
[IsPublic] ,[AccountId] ,[GroupId] ,[FolderId], [CreatedDateTime]
FROM [StreamItems] as A
WHERE IsPublic = 1
OR (AccountId = @accountId)
OR (GroupId = ANY(SELECT GroupId FROM dbo.AccountGroups WHERE AccountId = @accountId) and A.FolderId is null)
OR (FolderId = ANY(SELECT FolderId FROM dbo.AccountFolders WHERE AccountId = @accountId))
order by CreatedDateTime desc;



Here we see that the permission logic is encoded in the WHERE clauses at the bottom of the query.  The logic here only displays items if any of the following rules are true:



  1. The item is publicly visible

  2. The item was created by the user who is invoking the query

  3. The user who is invoking the query is a member of a group that the item was posted to

  4. The user who is invoking the query is a member of a folder that the item was posted to

The other thing to notice about the query is that only the Top N rows are retrieved and that the query is ordered by the date of the item.  The reason for this is performance.  Note also that, for performance reasons your would create an index on the CreatedDateTime column so that the ordering would be done via an index seek and not by a table scan.


Performance Characteristics


For many smaller applications, performance will not be a significant factor to designing your architecture.  And, as they say, in the world of the web, performance constraints are generally a nice problem to have (because it means that you have lots of users Smile).  But it pays to do some analysis to see the rate at which your data will grow.  To consider this, let’s assume the following set of user profiles for our website:



image


This shows user profile data for an application which has 3 segments of users:



  1. Low Use (avg. 2 activity items per day) – account for 40% of the total user base

  2. Medium Use (avg. 10.7 activity items per day) – account for 40% of the total user base

  3. High Use (avg. 23 activity items per day) – account for 20% of the total user base

Given these usage profile ratios, we can then start to calculate how many items our Activity Stream table will grow to over time based on the total number of users that we expect to have.



image


Here we can see the calculation of expected activity based on total site users of 50, 100, 1,000, and 10,000.  At 1,000 users per day we can see that our profile user base would generate around 2,050 database entries per day and, at 10,000 site users, this would grow to around 20,500 entries per day.


So, at the top end of these numbers, our Activity Steam database would accumulate around 2.5 million rows of data every 120 days.  This is where the optimization benefit of indexing and only returning the most recent N entries would really pay the most dividend as you would still be fetching data in a few hundred milliseconds and returning it to your user in an acceptable amount of time.

No comments:

Post a Comment